Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <n.j.cox@durham.ac.uk> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: RE: backfill missing data |

Date |
Tue, 24 Aug 2010 19:46:41 +0100 |

To summarise selectively, you got replies from three people independently, David Kantor, Steve Samuels and myself, all pretty experienced Stata people and all of whom said "get your problem into -long- form". That is not to say that your problem is not soluble in Stata. I am sure it is. The problem is, I suspect, entirely psychological, that the people who have looked at this can't wrap their heads around what you want to do because the whole thing is structured in an unfamiliar way. A better strategy may be to seek help locally within your institution so that you can talk through what your problem is with someone who understands your kind of data and knows more Stata than you do. Or take our advice.... Nick n.j.cox@durham.ac.uk David Torres Nick, Your point is well taken. I figured, with respect to the shape of the data, I'd get the same response on this question as on the previous one. As I end up reshaping the data to wide format for purposes of imputation, I begin from that point in my questioning. Apologies. About the relationship between stfin1_* and stfin2_*, and so on, the data I'm working with allows for respondents to report all jobs worked since the date of last interview. stfin1_* refers to the begin and end dates of job1 worked in year *; stfin2_* refers to the begin and end date of job2 worked in year *. If a job is current at the time of the survey date, then the end date is reported as the survey date. A job is included in a year only if it was ended since the date of last interview; begin dates are irrelevant. If a respondent misses a year or more, but returns in a later round, he is asked about all jobs held since the date of their last interview. Since begin and end dates are given, and since a job is attached to a specific employer, who is given a unique ID, it is easy to backfill missing information. The ultimate purpose is to tie wages/compensation and hours or weeks worked to a specific year, even if the interview was missed for that year. This is not information we want to impute since it already exists. I suppose there may be easier ways to pull information from across the years to construct my variables, but I'm not yet a stata expert. D Quoting Nick Cox <n.j.cox@durham.ac.uk>: > As previously advised, > > 1. You have panel data which you are holding in a -wide- structure. > > 2. On the whole that's a bad idea and you would be better off with a > -long- structure. > > Ignoring this advice is likely to deter many Statalist members from > paying this kind of question very much attention. Stata has lots of > tricks for handling panel data held in a long structure. If you > choose to work with panel data held in a wide structure _some_ > things are easier but the difficult things are often _much_ more > difficult and it needs considerable Stata fluency to avoid messing > around for hours and hours and hours with programming. > > Homilies aside, I don't understand most of the specifics here. For > example I don't understand the relationship between -stfin1_*- and > -stfin2_*- and in particular why "stfin2_2000 should be copied to > stfin1_1999". > > That said, this code might give you ideas on Stata technique which > you can modify for your real problem. > > foreach y in 1999 1998 { > local yp1 = `y' + 1 > replace stfin1_`y' = st_fin1_`yp1' if missing(stfin1_`y') > replace stfin2_`y' = st_fin2_`yp1' if missing(stfin2_`y') > } > > See also http://www.stata.com/support/faqs/data/missing.html for the > "long" approach here and > > SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking > Stata: Rowwise > (help rowsort, rowranks if installed) . . . . . . . . . . . > N. J. Cox > Q1/09 SJ 9(1):137--157 > shows how to exploit functions, egen functions, and Mata > for working rowwise; rowsort and rowranks are introduced > > for various tips and tricks for the "wide" structure. > > Naturally, you will need to compensate in your analysis for this > rather rigid imputation. You really will have fewer data than might > appear to be the case. > > Nick > n.j.cox@durham.ac.uk > > David Torres > > I'm working with longitudinal data (12 rounds of info collected so > far) and need to backfill information for respondents who were not > interviewed in a given year subsequent to round 1. Information on my > variables of interest, when not collected in a round due to > noninterview, can be gathered in the next round in which respondents > are interviewed. I'd like to carry that information back so that it > fills in the missing cells in the year and job number to which it > should apply. > > I've concatenated unformatted date variables for each year and job > number so that start and finish dates for a job are carried back > together. Every pair of numbers, then, including the space in > between, represent a start and finish date. All dates here, though > for example purposes only, are year specific. An example of what I > have, then, is: > > pubid stfin1_1998 stfin2_1998 stfin1_1999 stfin2_1999 stfin1_2000 stfin2_2000 > 1 13901 14200 14100 14200 14247 14590 > 2 13890 14198 14310 14525 > 3 14000 14208 14311 14915 > 4 13883 14650 14351 14600 14635 14900 > > For pubid 1, the values in stfin1_2000 would be copied to stfin1_1999 > as it applies to that year. The same goes for pubid 2. In pubid 3, > stfin1_2000 should be copied to stfin1_1998 as it applies to that > year; stfin2_2000 should be copied to stfin1_1999 since it applies to > that year. In pubid 4, stfin1_1999 should be copied to stfin1_1998. > I only mean to copy follow-up year information to cells for which > current year information is missing, or ". ." > > Is there an easy way to do this across several years and job numbers > at the same time? Perhaps using a foreach command? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: backfill missing data***From:*David Torres <torresd@umich.edu>

**st: RE: backfill missing data***From:*Nick Cox <n.j.cox@durham.ac.uk>

**Re: st: RE: backfill missing data***From:*David Torres <torresd@umich.edu>

- Prev by Date:
**Re: st: Op. sys. refuses to provide memory - a cautionary tale** - Next by Date:
**Re: st: Anyone has the Stata code for National Nursing Home Survey of various years** - Previous by thread:
**Re: st: RE: backfill missing data** - Next by thread:
**st: Anyone has the Stata code for National Nursing Home Survey of various years** - Index(es):