Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: RE: carryforward


From   David Kantor <[email protected]>
To   [email protected]
Subject   Re: st: RE: RE: carryforward
Date   Mon, 23 Jan 2012 14:01:28 -0500

I never expected this to invoke such a reaction.
But then, I should have explained (and maybe I should explain in the .hlp and the ssc description), that it was not intended as a method of imputing missing values.

The intent is to fill in values in "holes", where a value is understood to prevail until explicitly changed. This assumes that the data are sorted such that the concept of "prevail until" makes sense; typically this is time-based.

The scenario where I typically use this is where you have two or more datasets that represent changes in different attributes over time -- say a person's salary and marital status. (Note that not all are numeric.) Each dataset should be uniquely sorted on person-id and date. But the changes may occur on different dates in the different datasets. Also, these datasets should have non-missing values for the pertinent variables.

The datasets are merged. This leaves holes where there was a change in one attribute but not on the other attribute for a given date -- corresponding to unmatched records in the merge. E.g., if a salary change occurred on a particular date, but not a change of marital status, then the merged record would have a missing value for marital status. And vice-versa. Then what you want is to carry the prevailing value from one record to the next, until a nonmissing value is encountered.

You also want to interrupt the process when a new person_id is encountered. Then you would use -by-:
        by person_id (date): carryforward salary marital_status, replace

Finally, note that there may be instances where there are missing values in the original data, and you would not want to carry values into and through the corresponding merged records. (E.g., a missing value in the salary dataset; there was a salary change on jan23, 2012, but you don't know what it was.) There are ways to handle that as well.

I hope this is helpful.
--David

At 12:58 PM 1/23/2012, Nick Cox wrote:
If this method is one of imputing missing values that in practice will be varying by a constant that was the last observed value, then as Tony implies it clearly can be problematic.

But the method of replacing missing values by previous non-missing values is one I often use with small datasets entered by hand. When the observations come in blocks, I only need to type in values for the first identifier in each block, and then -replace- appropriately.

Sometimes datasets arrive like that too. Only the first value in a block of some blocked variable is explicit, so you have to fill in (or fill out) implied similar values.

Nick
[email protected]

Lachenbruch, Peter

This method has been seriously questioned and gives very poor answers generally. A true p-value may be reported anywhere from 0.01 to0.15 when it should be 0.05. I strongly urge it not be used.

________________________________________
From: [email protected] [[email protected]] On Behalf Of David Kantor [[email protected]]

Once again, thanks to Kit Baum, a new version of -carryforward- is
available on SSC.
This upgrade adds the -if- and -in- qualifiers.
Actually, the upgrade was written a long time ago, but never got
uploaded until now. Sorry, if that was my fault.

-carryforward- carries values from one observation to the next,
filling in missing values.

-ssc install carryforward-

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index