[P] mark -- Mark observations for inclusion
Syntax
Create marker variable after syntax
marksample lmacname [, novarlist strok zeroweight noby]
Create marker variable
mark newmarkvar [if] [in] [weight] [, zeroweight noby]
Modify marker variable
markout markvar [varlist] [, strok sysmissok]
Find range containing selected observations
markin [if] [in] [, name(lclname) noby]
Modify marker variable based on survey-characteristic variables
svymarkout markvar
aweights, fweights, iweights, and pweights are allowed; see weight.
varlist may contain time-series operators; see tsvarlist.
Description
marksample, mark, and markout are for use in Stata programs. marksample
and mark are alternatives; marksample links to information left behind by
syntax, and mark is seldom used. Both create a 0/1 to-use variable that
records which observations are to be used in subsequent code. markout
sets the to-use variable to 0 if any variables in varlist contain missing
and is used to further restrict observations.
markin is for use after marksample, mark, and markout and, sometimes,
provides a more efficient encoding of the observations to be used in
subsequent code. markin is rarely used.
svymarkout sets the to-use variable to 0 wherever any of the
survey-characteristic variables contain missing values; it is discussed
in [SVY] svymarkout and is not further discussed here.
Options
novarlist is for use with marksample. It specifies that missing values
among variables in varlist not cause the marker variable to be set to
0. Specify novarlist if you previously specified
syntax newvarlist ...
or
syntax newvarname ...
You should also specify novarlist when missing values are not to
cause observations to be excluded (perhaps you are analyzing the
pattern of missing values).
strok is used with marksample or markout. Specify this option if string
variables in varlist are to be allowed. strok changes rule 6 in
Remarks below to read
"The marker variable is set to 0 in observations for which any of the
string variables in varlist contain ""."
zeroweight is for use with marksample or mark. It deletes rule 1 in
Remarks below, meaning that observations will not be excluded because
the weight is zero.
noby is used rarely and only in byable(recall) programs. It specifies
that, in identifying the sample, the restriction to the by-group be
ignored. mark and marksample are to create the marker variable as
they would had the user not specified the by prefix. If the user did
not specify the by prefix, specifying noby has no effect. noby
provides a way for byable(recall) programs to identify the overall
sample. For instance, if the program needed to calculate the
percentage of observations in the by-group, the program would need to
know both the sample to be used on this call and the overall sample.
The program might be coded as
program ..., byable(recall)
...
marksample touse
marksample alluse, noby
...
quietly count if `touse'
local curN = r(N)
quietly count if `alluse'
local totN = r(N)
local frac = `curN'/`totN'
...
end
See [P] byable.
sysmissok is used with markout. Specify this option if numeric variables
in varlist equal to system missing (.) are to be allowed and only
numeric variables equal to extended missing (.a, .b, ...) are to be
excluded. The default is that all missing values (., .a, .b, ...)
are excluded.
name(lclname) is for use with markin. It specifies the name of the macro
to be created. If name() is not specified, the name in is used.
Remarks
Regardless of whether you use mark or marksample, followed or not by
markout, the following rules apply:
1. The marker variable is set to 0 in observations for which weight is 0
(but see option zeroweight).
2. The appropriate error message is issued, and everything stops if
weight is invalid (such as being less than 0 in some observation or
being a noninteger for frequency weights).
3. The marker variable is set to 0 in observations for which the if exp
is not satisfied.
4. The marker variable is set to 0 in observations outside the in range.
5. The marker variable is set to 0 in observations for which any of the
numeric variables in varlist contain a numeric missing value.
6. The marker variable is set to 0 in all observations if any of the
variables in varlist are strings; see option strok for an exception.
7. The marker variable is set to 1 in the remaining observations.
Using the name touse is a convention, not a rule, but it is recommended
for consistency between programs.
Example
program ...
syntax ...
marksample touse
...
... if `touse' ...
...
end