 From wgould@stata.com (William Gould, StataCorp LP) To statalist@hsphsun2.harvard.edu Subject Re: st: details of how the impute command work Date Wed, 07 Nov 2007 08:37:17 -0600

```Tim Hale <timhale@mindspring.com> writes,

> I am trying to figure out exactly what the -impute- command in Stata
> does to estimate missing values.

-impute y x1 x2 ..., gen(yhat)-= creates

yhat_j = y_j            in observations j for which y_j<.
= prediction_j   otherwise

Each prediction_j is the predicted value from a linear regression, said
linear regression using the subset of variables of x1, x2, ..., that do not
contain missing in observation j.

Consider the following dataset:

. list y x1 x2

+-------------+
| y   x1   x2 |
|-------------|
1. | 1    2    3 |
2. | 4    5    6 |
3. | 5    5    8 |
4. | 5    6    6 |
5. | .    5    6 |
|-------------|
6. | .    .    3 |
+-------------+

and the command

. impute y x1 x2, gen(yhat)

Then yhat in observation 5 would be based on a regression of y on x1 and x2,
because both x1 and x2 are not missing in observation 5.  This amounts to

. regress y x1 x2
. predict prediction
. replace yhat = prediction in 5

The yhat in observation 6 would be based on a regression of y on x2, because
x1 in missing in observation 6.  This amounts to

. regress y x2
. prediction prediction
. replace yhat = prediction in 6

-- Bill
wgould@stata.com
```