st: RE: Re: Missing Data

 From "Nick Cox" To Subject st: RE: Re: Missing Data Date Mon, 8 Dec 2003 14:20:58 -0000

```My reading of this problem differs from others'.
As I see it, what is wanted is to fill in the gaps
using the other data for the same individual.

-sex- and -empstatus- are not quite on the same footing here. In
most datasets it would be accurate to assume that individuals have
one sex or the other throughout a study, so that

. egen Sex = max(sex), by(id)
. bysort id : replace sex == Sex if mi(sex)

would be adequate. We could check the underlying assumption here
in various ways, such as

. egen Sex1 = min(sex), by(id)
. egen Sex2 = max(sex), by(id)
. assert Sex1 == Sex2

A key feature of -egen, max()- in this problem is that the result
is missing if and only if all the arguments are missing.

Employment status is different, as changes of status could be
common, but there are various ways of guessing at gaps: you don't
say which you regard as acceptable.

One way is just a "copying downwards" technique, discussed in some
detail at http://www.stata.com/support/faqs/data/missing.html

Note that blocks at missing values at the start of each panel need
special attention.

A more conservative method is to fill in a gap if and only if the
status at either end of the gap is the same. So you would fill in

1
.
.
1

with 1s, but do nothing about

1
.
.
2

How could you do that? One way is to "copy downwards" creating
a forward fill-in; then to reverse time and in effect "copy upwards"
creating a backward fill-in. You use results if and only if the
two agree.

So the forward fill-in is the same as -empstatus-

. gen forward_fillin = empstatus

except that we copy previous values downwards in a cascade:

. bysort id (month) : replace forward_fillin = forward_fillin[_n-1]
if mi(forward_fillin)

Similarly, the backward fill-in is the same as -empstatus-

. gen backward_fillin = empstatus

except that -- after a reversal of time -- we copy previous values

. gen nmonth = -month
. bysort id (nmonth) : replace backward_fillin =
backward_fillin[_n-1] if mi(backward_fillin)

The non-missing values on either side of each block of missings
must have been the same if the two fill-ins are the same

. replace empstatus = forward_fillin if mi(empstatus) &
forward_fillin == backward_fillin

We -sort- again

. sort id month

For more background, see the FAQ already cited.

Nick
n.j.cox@durham.ac.uk

Shabbar Jaffry/Yaseen

> Currently I am working on a data set where I have to fill
> the gaps in the data and then
> do analysis. An example of the data is as follows:
>
> id	month		sex	empstatus
> 17	Feb-00		.	.
> 17	Mar-00		.	.
> 17	Apr-00		.	.
> 17	May-00		.	.
> 17	Jun-00		.	1
> 17	Jul-00		.	2
> 17	Aug-00		1	2
> 17	Sep-00		1	2
> 17	Oct-00		1	2
> 17	Nov-00		1	2
> 17	Dec-00		1	2
> 17	Jan-01		1	2
> 17	Feb-01		1	2
> 17	Mar-01		1	2
> 17	Apr-01		1	2
> 17	May-01		1	.
> 17	Jun-01		1	.
> 17	Jul-01		1	.
> 17	Aug-01		1	.
> 17	Sep-01		1	.
> 17	Oct-01		1	.
> 17	Nov-01		1	.
> 17	Dec-01		1	.
> 17	Jan-02		.	.
> 17	Feb-02		.	.
> 17	Mar-02		.	.
> 17	Apr-02		.	.
> 17	May-02		.	.
> 17	Jun-02		.	.
> 164	Mar-98		2	1
> 164	Apr-98		2	1
> 164	May-98		.	.
> 164	Jun-98		2	1
> 502	Jul-98		1	.
> 502	Aug-98		1	.
> 502	Sep-98		1	2
> 502	Oct-98		.	.
> 502	Nov-98		.	.
> 502	Dec-98		.	.
> 502	Jan-99		.	2
> 502	Feb-99		.	1
> 502	Mar-99		.	1
> 502	Apr-99		1	.
> 502	May-99		1	.
> 502	Jun-99		1	.
> 502	Jul-99		1	.
> 502	Aug-99		1	.
> 502	Sep-99		1	.
> 502	Oct-99		1	.
> 502	Nov-99		1	.
>
> where for sex 1 is coded for males and 2 for females and
> for employment status
> (empstatus) 1 is employed and 2 for unemployed. Is there
> any fancy command in stata which can fill the gaps.

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```