Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: backfill missing data

From   Nick Cox <>
To   "''" <>
Subject   st: RE: backfill missing data
Date   Tue, 24 Aug 2010 17:30:12 +0100

As previously advised, 

1. You have panel data which you are holding in a -wide- structure. 

2. On the whole that's a bad idea and you would be better off with a -long- structure. 

Ignoring this advice is likely to deter many Statalist members from paying this kind of question very much attention. Stata has lots of tricks for handling panel data held in a long structure. If you choose to work with panel data held in a wide structure _some_ things are easier but the difficult things are often _much_ more difficult and it needs considerable Stata fluency to avoid messing around for hours and hours and hours with programming. 

Homilies aside, I don't understand most of the specifics here. For example I don't understand the relationship between -stfin1_*- and -stfin2_*- and in particular why "stfin2_2000 should be copied to stfin1_1999".  

That said, this code might give you ideas on Stata technique which you can modify for your real problem. 

foreach y in 1999 1998 { 
		local yp1 = `y' + 1 
		replace stfin1_`y' = st_fin1_`yp1' if missing(stfin1_`y') 
		replace stfin2_`y' = st_fin2_`yp1' if missing(stfin2_`y')

See also for the "long" approach here and 

SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata: Rowwise
        (help rowsort, rowranks if installed) . . . . . . . . . . .  N. J. Cox
        Q1/09   SJ 9(1):137--157
        shows how to exploit functions, egen functions, and Mata
        for working rowwise; rowsort and rowranks are introduced

for various tips and tricks for the "wide" structure. 

Naturally, you will need to compensate in your analysis for this rather rigid imputation. You really will have fewer data than might appear to be the case. 


David Torres

I'm working with longitudinal data (12 rounds of info collected so  
far) and need to backfill information for respondents who were not  
interviewed in a given year subsequent to round 1.  Information on my  
variables of interest, when not collected in a round due to  
noninterview, can be gathered in the next round in which respondents  
are interviewed.  I'd like to carry that information back so that it  
fills in the missing cells in the year and job number to which it  
should apply.

I've concatenated unformatted date variables for each year and job  
number so that start and finish dates for a job are carried back  
together.  Every pair of numbers, then, including the space in  
between, represent a start and finish date.  All dates here, though  
for example purposes only, are year specific.  An example of what I  
have, then, is:

pubid stfin1_1998 stfin2_1998 stfin1_1999 stfin2_1999 stfin1_2000 stfin2_2000
1     13901 14200 14100 14200                         14247 14590
2     13890 14198                                     14310 14525
3                                                     14000 14208 14311 14915
4                             13883 14650 14351 14600 14635 14900

For pubid 1, the values in stfin1_2000 would be copied to stfin1_1999  
as it applies to that year.  The same goes for pubid 2.  In pubid 3,  
stfin1_2000 should be copied to stfin1_1998 as it applies to that  
year; stfin2_2000 should be copied to stfin1_1999 since it applies to  
that year.  In pubid 4, stfin1_1999 should be copied to stfin1_1998.   
I only mean to copy follow-up year information to cells for which  
current year information is missing, or ". ."

Is there an easy way to do this across several years and job numbers  
at the same time?  Perhaps using a foreach command?

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index