Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: RE: st: RE: tsset

From   n j cox <>
Subject   Re: RE: st: RE: tsset
Date   Sun, 21 May 2006 17:28:19 +0100

Alexander Nervedi

apologies for any confusion in the way I have been using terms. in my mind there is no missing data. the data set clearly tells me that for county = x, household = 1, year = 1 variable V1(x,h,t) = x111. The data set however does have gaps such as

county household year V1
x 1 1 12
x 1 2 13
x 1 4 12
x 1 7 12

So without any missing data, I define a uniqe household id using

egen uid = group(county household)

county household year V1 uid
x 1 1 12 1
x 1 2 13 1
x 1 4 12 1
x 1 7 12 1

I need this so that I am able to tsset my data set.

tsset uid year

Once tsset, I would like to enter the gaps into the dataset, and tsfill does
it for me.

tsfill, full

However, using tsfill creates missing observations whose values i actually
do know. for variables it is a 0 and for identifies like county and
household, it has to be the same value within uid. Thus, my data set looks

county household year V1 uid
x 1 1 12 1
x 1 2 13 1
. . 3 . 1
x 1 4 12 1
. . 5 . 1
. . 6 . 1
x 1 7 12 1

The coding instructions tell me that V1 = 0 for the missing years. however,
I still need to fill in the county and household vairable missings
observations that tsfill created. and currently, I am using a sequence of
replace with leads and lags within uid to fill this. I was hoping there
maybe an automated way of doing this.

thanks for your response.

>>> This appears to fall under the FAQ

How can I replace missing values with previous or following nonmissing values?
How can I replace missing values within sequences?

Note that

. search missing

would have pointed you to this directly.

However, applying the rules appears a little tricky
in your case as

. sort county household

will mess up your sort order. It would seem that

replace county = county[_n-1] if mi(county)
replace household = household[_n-1] if mi(household)

should work.


Nick Cox

>The effect of your -egen, group()- is
>to lump all the missings on -county-
>and/or -household- together. In cases
>where -household- is missing but not
>-county-, or vice versa, that throws
>away some information.
>-egen, group() missing- will do a bit
>But the reconstruction of missing data
>seems somewhere between difficult and
>impossible, on least on the information
>you provide.
>For example, suppose
>you have -county- but not -household-.
>There seem two possibilities. The
>household is in fact one of the other
>households in the same county in
>your dataset, or it is not. Do you
>have any grounds to say which is correct?
>Conversely, suppose you have -household-
>but -county-. It may be that your numbering
>system will enable you to reconstruct the
>Finally, suppose you have neither -household-
>nor -county-. If there is a method for
>imputing, it must be based on the other variables.
>Alexander Nervedi
> >
> > I have panel data with gaps. After tssfill, full i have a
> > complete data that
> > but there are many covariates, some string and some numeric,
> > that become
> > complete but are actually not. For example.
> >
> > egen uid = group(county household)
> > tsset uid year
> > tsfill, full
> >
> >
> > will generate missing values for county and household to fill
> > in the gaps,
> > even though uid and year are complete. what is a good way to
> > fill in missing
> > observations for variables like county and household ?
* For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index