Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Average of last two previous observations conditional on another variable value |
Date | Mon, 24 Mar 2014 14:06:37 +0000 |
I struggled with this mightily until I got a better way to think about it. What's clear is that a loop over observations is the solution of last resort, although long-term readers of this list should be able to recall that that doesn't rule out such a solution being offered.... The trick is that the two previous values you want are easy to identify *once you have a dataset that contains only positive values for -dummy-*. In your example -id- is redundant given -year- and is manifestly not a panel identifier. So, I faked a panel identifier for a single panel. The code should work with an equivalent real panel identifier. . clear . input id year dummy y average id year dummy y average 1. 1 1990 0 10 . 2. 2 1991 0 11 . 3. 3 1992 1 12 . 4. 4 1993 1 13 . 5. 5 1994 0 14 12.5 6. 6 1995 0 15 12.5 7. 7 1996 1 16 12.5 8. 8 1997 1 17 14.5 9. 9 1998 1 18 16.5 10. 10 1999 0 19 17.5 11. end . gen panelid = 42 . save mymaster file mymaster.dta saved . keep if dummy (5 observations deleted) . bysort panelid (year) : gen time = _n . tsset panelid time panel variable: panelid (strongly balanced) time variable: time, 1 to 5 delta: 1 unit . gen work = (y + L.y)/2 (1 missing value generated) . keep panelid year work . merge 1:1 panelid year using mymaster Result # of obs. ----------------------------------------- not matched 5 from master 0 (_merge==1) from using 5 (_merge==2) matched 5 (_merge==3) ----------------------------------------- . tsset panelid year panel variable: panelid (strongly balanced) time variable: year, 1990 to 1999 delta: 1 unit . replace work = L.work if missing(work) (3 real changes made) . gen newaverage = L.work (4 missing values generated) . drop _merge work . list +-------------------------------------------------------+ | year panelid id dummy y average newave~e | |-------------------------------------------------------| 1. | 1990 42 1 0 10 . . | 2. | 1991 42 2 0 11 . . | 3. | 1992 42 3 1 12 . . | 4. | 1993 42 4 1 13 . . | 5. | 1994 42 5 0 14 12.5 12.5 | |-------------------------------------------------------| 6. | 1995 42 6 0 15 12.5 12.5 | 7. | 1996 42 7 1 16 12.5 12.5 | 8. | 1997 42 8 1 17 14.5 14.5 | 9. | 1998 42 9 1 18 16.5 16.5 | 10. | 1999 42 10 0 19 17.5 17.5 | +-------------------------------------------------------+ For convenience here is the code as a block, with commentary. * set up example clear input id year dummy y average 1 1990 0 10 . 2 1991 0 11 . 3 1992 1 12 . 4 1993 1 13 . 5 1994 0 14 12.5 6 1995 0 15 12.5 7 1996 1 16 12.5 8 1997 1 17 14.5 9 1998 1 18 16.5 10 1999 0 19 17.5 end * we need a panel identifier, here just for a single panel gen panelid = 42 save mymaster * reduce the dataset to just -dummy==1- and -tsset- with pseudotime keep if dummy bysort panelid (year) : gen time = _n tsset panelid time * we want the average of this y and the previous y: works w/ panels too gen work = (y + L.y)/2 * now -merge- back and -tsset- as usual keep panelid year work merge 1:1 panelid year using mymaster tsset panelid year * fill in gaps; it's just carry forward replace work = L.work if missing(work) * finishing tape in sight; mine's a red wine gen newaverage = L.work drop _merge work list Nick njcoxstata@gmail.com On 24 March 2014 13:15, Irene Ferrari <irene.ferrari85@gmail.com> wrote: > Dear Statalisters, > > I am a new statalister and I would greatly appreciate if you could > give me advice on something which seems trivial to explain, less > trivial to translate in Stata code. > > I have a panel dataset, and I would like to perform the average > (variable "average") of the last two previous in time observations of > "y" whose corresponding dummy variable ("dummy") is equal to 1. > > Example: > > -------------------------------------------------------------- > id year dummy y average > 1 1990 0 10 . > 2 1991 0 11 . > 3 1992 1 12 . > 4 1993 1 13 . > 5 1994 0 14 12.5 > 6 1995 0 15 12.5 > 7 1996 1 16 12.5 > 8 1997 1 17 14.5 > 9 1998 1 18 16.5 > 10 1999 0 19 17.5 > -------------------------------------------------------------- > > So, for example, when computing the average for year 1996, I need the > programme to search back the two previous y such that the > corresponding dummy is equal to 1, which are y==12 and y==13, and then > compute the average. > > Here is the latest code (of many) I was trying, with no success (it > seems stuck in an infinite loop...): > > replace average = . > local k=1 > local r=0 > local N = _N > local z=1 > forvalues i = 1/`N' { > while `k'<=2 { > if dummy[`i'-`z'] == 1 { > local r= > `r'+ y[`i'-`z'] > local k=`k'+1 > local z=`z'+1 > } > else { > local z=`z'+1 > } > } > replace average=(`r'/2) in `i' > local k=1 > local r=0 > local z=1 > } > > > I have never needed to perform complicated programming in Stata, so it > is totally possible I am missing something obvious. > Thanks for your consideration. > > > Irene Ferrari > PhD Student - Department of Economics > University of Bologna * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/