Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: tsset

From   "Alexander Nervedi" <>
Subject   RE: st: RE: tsset
Date   Fri, 19 May 2006 17:32:27 +0000

apologies for any confusion in the way I have been using terms. in my mind there is no missing data. the data set clearly tells me that for county = x, household = 1, year = 1 variable V1(x,h,t) = x111. The data set however does have gaps such as

county household year V1
x 1 1 12
x 1 2 13
x 1 4 12
x 1 7 12

So without any missing data, I define a uniqe household id using

egen uid = group(county household)

county household year V1 uid
x 1 1 12 1
x 1 2 13 1
x 1 4 12 1
x 1 7 12 1

I need this so that I am able to tsset my data set.

tsset uid year

Once tsset, I would like to enter the gaps into the dataset, and tsfill does it for me.

tsfill, full

However, using tsfill creates missing observations whose values i actually do know. for variables it is a 0 and for identifies like county and household, it has to be the same value within uid. Thus, my data set looks like:

county household year V1 uid
x 1 1 12 1
x 1 2 13 1
. . 3 . 1
x 1 4 12 1
. . 5 . 1
. . 6 . 1
x 1 7 12 1

The coding instructions tell me that V1 = 0 for the missing years. however, I still need to fill in the county and household vairable missings observations that tsfill created. and currently, I am using a sequence of replace with leads and lags within uid to fill this. I was hoping there maybe an automated way of doing this.

thanks for your response.

From: "Nick Cox" <>
To: <>
Subject: st: RE: tsset
Date: Fri, 19 May 2006 18:17:05 +0100

The effect of your -egen, group()- is
to lump all the missings on -county-
and/or -household- together. In cases
where -household- is missing but not
-county-, or vice versa, that throws
away some information.

-egen, group() missing- will do a bit

But the reconstruction of missing data
seems somewhere between difficult and
impossible, on least on the information
you provide.

For example, suppose
you have -county- but not -household-.
There seem two possibilities. The
household is in fact one of the other
households in the same county in
your dataset, or it is not. Do you
have any grounds to say which is correct?

Conversely, suppose you have -household-
but -county-. It may be that your numbering
system will enable you to reconstruct the

Finally, suppose you have neither -household-
nor -county-. If there is a method for
imputing, it must be based on the other variables.


Alexander Nervedi
> I have panel data with gaps. After tssfill, full i have a
> complete data that
> but there are many covariates, some string and some numeric,
> that become
> complete but are actually not. For example.
> egen uid = group(county household)
> tsset uid year
> tsfill, full
> will generate missing values for county and household to fill
> in the gaps,
> even though uid and year are complete. what is a good way to
> fill in missing
> observations for variables like county and household ?

*   For searches and help try:
Donít just search. Find. Check out the new MSN Search!

* For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index