[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: Re: Missing Data
My reading of this problem differs from others'.
As I see it, what is wanted is to fill in the gaps
using the other data for the same individual.
-sex- and -empstatus- are not quite on the same footing here. In
most datasets it would be accurate to assume that individuals have
one sex or the other throughout a study, so that
. egen Sex = max(sex), by(id)
. bysort id : replace sex == Sex if mi(sex)
would be adequate. We could check the underlying assumption here
in various ways, such as
. egen Sex1 = min(sex), by(id)
. egen Sex2 = max(sex), by(id)
. assert Sex1 == Sex2
A key feature of -egen, max()- in this problem is that the result
is missing if and only if all the arguments are missing.
Employment status is different, as changes of status could be
common, but there are various ways of guessing at gaps: you don't
say which you regard as acceptable.
One way is just a "copying downwards" technique, discussed in some
detail at http://www.stata.com/support/faqs/data/missing.html
Note that blocks at missing values at the start of each panel need
A more conservative method is to fill in a gap if and only if the
status at either end of the gap is the same. So you would fill in
with 1s, but do nothing about
How could you do that? One way is to "copy downwards" creating
a forward fill-in; then to reverse time and in effect "copy upwards"
creating a backward fill-in. You use results if and only if the
So the forward fill-in is the same as -empstatus-
. gen forward_fillin = empstatus
except that we copy previous values downwards in a cascade:
. bysort id (month) : replace forward_fillin = forward_fillin[_n-1]
Similarly, the backward fill-in is the same as -empstatus-
. gen backward_fillin = empstatus
except that -- after a reversal of time -- we copy previous values
upwards in a cascade:
. gen nmonth = -month
. bysort id (nmonth) : replace backward_fillin =
backward_fillin[_n-1] if mi(backward_fillin)
The non-missing values on either side of each block of missings
must have been the same if the two fill-ins are the same
. replace empstatus = forward_fillin if mi(empstatus) &
forward_fillin == backward_fillin
We -sort- again
. sort id month
For more background, see the FAQ already cited.
> Currently I am working on a data set where I have to fill
> the gaps in the data and then
> do analysis. An example of the data is as follows:
> id month sex empstatus
> 17 Feb-00 . .
> 17 Mar-00 . .
> 17 Apr-00 . .
> 17 May-00 . .
> 17 Jun-00 . 1
> 17 Jul-00 . 2
> 17 Aug-00 1 2
> 17 Sep-00 1 2
> 17 Oct-00 1 2
> 17 Nov-00 1 2
> 17 Dec-00 1 2
> 17 Jan-01 1 2
> 17 Feb-01 1 2
> 17 Mar-01 1 2
> 17 Apr-01 1 2
> 17 May-01 1 .
> 17 Jun-01 1 .
> 17 Jul-01 1 .
> 17 Aug-01 1 .
> 17 Sep-01 1 .
> 17 Oct-01 1 .
> 17 Nov-01 1 .
> 17 Dec-01 1 .
> 17 Jan-02 . .
> 17 Feb-02 . .
> 17 Mar-02 . .
> 17 Apr-02 . .
> 17 May-02 . .
> 17 Jun-02 . .
> 164 Mar-98 2 1
> 164 Apr-98 2 1
> 164 May-98 . .
> 164 Jun-98 2 1
> 502 Jul-98 1 .
> 502 Aug-98 1 .
> 502 Sep-98 1 2
> 502 Oct-98 . .
> 502 Nov-98 . .
> 502 Dec-98 . .
> 502 Jan-99 . 2
> 502 Feb-99 . 1
> 502 Mar-99 . 1
> 502 Apr-99 1 .
> 502 May-99 1 .
> 502 Jun-99 1 .
> 502 Jul-99 1 .
> 502 Aug-99 1 .
> 502 Sep-99 1 .
> 502 Oct-99 1 .
> 502 Nov-99 1 .
> where for sex 1 is coded for males and 2 for females and
> for employment status
> (empstatus) 1 is employed and 2 for unemployed. Is there
> any fancy command in stata which can fill the gaps.
* For searches and help try: