__Title__

**[U] 12.2.1 Missing values**

__Description__

Stata has 27 numeric missing values:

**.**, the default, which is called the "system missing value" or **sysmiss**

and

**.a**, **.b**, **.c**, ..., **.z**, which are called the "extended missing values".

Numeric missing values are represented by large positive values. The
ordering is

all nonmissing numbers < **.** < **.a** < **.b** < ... < **.z**

Thus, the expression **age > 60** is true if variable **age** is greater than 60
or missing.

To exclude missing values, ask whether the value is less than "**.**". For
instance,

**. list if age > 60 & age < .**

To specify missing values, ask whether the value is greater than or equal
to "**.**". For instance,

**. list if age >=.**

Stata has one string missing value, which is denoted by **""** (blank).

__Remarks__

More details concerning missing values and their treatment in Stata are
provided under the following headings:

Overview
Expressions
Operators
Functions
Matrices
Useful commands
Value labels
Estimation commands
Technical note: checking if a value is missing

__Overview__

1. Stata supports different types of numeric missing values that can be
used to specify different reasons that a value is unknown. The most
frequently used missing value **.**, referred to as sysmiss, is nearly
always generated by Stata when it cannot assign a specific value.
The 26 extended missing values **.a**, **.b**, ..., **.z** are available to users
requiring more elaborate tracking of missing values.

Empty strings are treated as missing values of type string.

2. Numeric missing values are represented by large positive values.
This means that an expression such as **income > 100** evaluates to **true**
for missing values of the variable **income**, as well as to those that
are greater than 100. Also, the simple expression **if** *varname*
evaluates to true for all nonzero values of *varname*, including
missing values.

3. The ordering of missing values is

*all nonmissing numbers* < **.** < **.a** < **.b** < ... < **.z**

4. Most Stata statistical commands deal with missing values by
disregarding observations with one or more missing values (called
"listwise deletion" or "complete cases only").

__Expressions__

Expressions occur in many places in Stata (see **[P] syntax** and exp). For
example,

**. generate** *newvarname* **=** *exp*

evaluates the expression *exp* for each observation of the variable
*newvarname*. Observations of *newvarname* are set to missing if *exp*
evaluates to missing.

Expressions are also used to restrict a command's operation to a subset
of the observations. For instance,

**. summarize** *varname* **if** *exp*

summarizes *varname* by using all observations for which *exp* evaluates to
true (not zero), including observations that are missing.

__Operators__

The relational operators (see operators) interpret missing values as
large positive numbers (see above). All the following thus evaluate to
true

**73 < .** **. == .** **.a == .a**
**.a != .** **.a < .b** **.a <= .b**

whereas all the following evaluate to false

**73 >= .** **. == .a** **. > .a**

The numerical operators (**+** etc) return missing if any of their arguments
are missing.

__Functions__

Stata has a few special functions for dealing with missing values:

**missing()** returns 1 (meaning true) if any of its arguments,
numeric or string, evaluates to missing and 0
(meaning false) otherwise.

**mi()** is a shorthand for **missing()**.

**matmissing(***K***)** returns 1 (meaning true) if any elements of the
matrix *K* are missing and 0 (meaning false)
otherwise.

Some Stata functions interpret **.** in a special way. For instance, the
function **inrange(x,a,b)** returns 1 if **x** belongs in the interval **[a,b]**.
This function interprets **a==.** as -infinity and **b==.** as +infinity. These
special interpretations are discussed in functions.

Other Stata functions return missing (**.**) if one or more of the arguments
are missing or invalid.

__Matrices__

Matrices may contain all types of missing values. The matrix operators
(see matrix operators)

**-** negate
**'** transpose

**\** row join
**,** column join
**+** add
**-** subtract
***** multiply (including multiply by scalar)
**/** division by scalar
**#** Kronecker product

generate missing values elementwise.

In the matrix product **C=A*B**, **C**[*i*,*j*] is missing if row *i* of **A** or column *j*
of **B** contain a missing value.

Matrix division by scalar **C=A/b** is not allowed if the scalar **b** is a
missing value. Otherwise, missing values in matrix **A** generate missing
values in **C** elementwise.

Like the **list** command, the **matrix list** command has a **nodotz** option to
display extended missing value **.z** as a blank string rather than as "**.z**".

__Useful commands__

-------------------------------------------------------------------------
**mvencode** changes missing values into numeric values
**mvdecode** changes numeric values into missing values
**codebook** provides extensive information about variables,
including the occurrence of simple and extended
missing values
**misstable** tabulates missing values
**egen, rownonmiss()** number of valid observations in a varlist
**egen, rowmiss()** number of missing values in a varlist
**recode** recodes a variable, optionally into a new variable,
with special facilities to recode missing values.
**mi** multiple imputation of missing values
**xtdescribe** describes participation patterns in panel data
-------------------------------------------------------------------------

__Value labels__

It is possible to define value labels for the extended missing values **.a**
to **.z**, but not for sysmiss **.**. These value labels show up in the same way
as value labels for nonmissing values. See **[D] label**.

__Estimation commands__

Most Stata commands ignore observations that are missing in one or more
of the variables referred to in the command. For instance, the
regression command **regress** disregards all observations that have a
missing value for the dependent variable or missing values for any of the
independent variables. This method is known as "listwise deletion",
"complete cases only", etc. It is statistically appropriate only if the
missing values are "at random". In an if or weight expression to a
command, the expressions will be evaluated, and the missing values will
be processed using the operators and function() logic.

Stata commands that can treat multiple observations as being related to
one observational unit (for example, observations from a panel in xt
models, episodes in st models) ignore specific observations from the
"group", namely, those that have missing values.

__Technical note: checking if a value is missing__

You might think you can test whether an expression or variable *exp* is
missing with the expression *exp***==.**. Remember, however, that Stata has 27
different missing values (**.**, **a**, **b**, ..., **z**).

*exp***==.** means that the expression *exp* equals a specific missing value,
namely, sysmiss **.**. *exp***==.** returns false if *exp* equals one of the
extended missing-value types such as **.a** or **.z**. To test whether *exp* is
missing, that is, equals either **.** or one of the extended missing values,
one should use the expression

*exp* **>= .**
or
**missing(***exp***)**

which can be abbreviated to

**mi(***exp***)**

To test whether *exp* is missing, use one of the following forms:

*exp* **< .**
**!missing(***exp***)**
**!mi(***exp***)**

An advantage of the last two forms is that the missing functions
**missing()** and **mi()** allow multiple (numeric or string) arguments to test
whether any of the argument is missing.