Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Carry over information on time-invariant covariate to all observations of a household?


From   David Kantor <kantor.d@att.net>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Carry over information on time-invariant covariate to all observations of a household?
Date   Tue, 03 Aug 2010 10:11:34 -0400

At 04:28 AM 8/3/2010, Jen Zhen wrote:
Dear Listers,

suppose I have a panel with dimensions Household and Year. I have the
time-invariant household characteristic X. Information on it is
currently given for each household in only one of the years, but I
would like to carry over this information to all observations of each
household.

To illustrate, the dataset looks like this:

HH   Year    X
1     1990     5
1     1991     .
1     1992     .
2     1990     .
2     1991     3
2     1992     .
3     1990     .
3     1991     .
3     1992     2

and I would like to fill in the missing values.

Currently my way of doing it is this:
- bysort HH: egen X2 = max(X) -
- replace X = X2 -

However, in a large dataset running this command takes forever, so I
am wondering whether there is a faster way to do this?

In addition to other advice given, you may want to look into -carryforward- on SSC.
ssc desc carryforward
ssc inst carryforward

It will carry forward the latest value in the sort order (presumably by HH and some other variable), whereas the advice given by Martin Weiss will carry the lowest value in each group. I noticed that the one observation (per HH group) that has a value is not always the first. Your method needs to address whether you want that value spread in both directions or just forward.

If you do use -carryforward-, you will need to decide on the sort order, which might be HH X or HH Year. If it is HH X, it will carry the greatest value forward, which is the same as the lowest if there is only one observation per HH group having a value in X. If you use HH Year, then it will carry values only into later years only -- not into earlier years. If you want the value spread backwards as well, you can follow it with a backward -carryforward-:
bysort HH (Year): carryforward X ...
gen int negyear = -year
bysort HH (negyear): carryforward X ...

This may be a more generally correct method, if there are any HHs with more than one value for X. But if you are certain that all HHs have only one value for X, then the advice in Martin Weiss's first reply is correct and simplest.

HTH
--David

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index