RE: st: differencing

Mon, 14 Nov 2005 16:44:01 -0000

Indeed. Your example is well behaved, but as we all know, real data need not be. Also, it is easy to get confused on the details of the -by:-, so that for example bysort state county : does not _guarantee_ that -year- is in the right order within -state- and -county-. Once -tsset-, neither pitfall should catch you. Nick n.j.cox@durham.ac.uk Eric G. Wruck > Yeah, I goofed. For one thing, I entered the data > incorrectly. I was trying to follow what Gregor said he > wanted, which I'm not sure I understood or that he wrote down > clearly. I fully acknowledge that using the D. operator > --which you & Kit suggested-- is probably the way to go. > > Nevertheless, I want to try to correct what I did earlier. I > added a third observation for one of the state county > combinations. I am assuming that Gregor wants a difference > in employment from one year to the next within state & > county. So here goes: > > . sort state county year > > . l > > +----------------------------------+ > | year state county employ~t | > |----------------------------------| > 1. | 1 1 1 10 | > 2. | 2 1 1 20 | > 3. | 3 1 1 22 | > 4. | 1 2 1 15 | > 5. | 2 2 1 30 | > +----------------------------------+ > > . bysort state county: gen diff = employment - employment[_n - 1] > (2 missing values generated) > > . l > > +-----------------------------------------+ > | year state county employ~t diff | > |-----------------------------------------| > 1. | 1 1 1 10 . | > 2. | 2 1 1 20 10 | > 3. | 3 1 1 22 2 | > 4. | 1 2 1 15 . | > 5. | 2 2 1 30 15 | > +-----------------------------------------+ > > > > If I understand the tsset stuff at all, that approach would > force Gregor to come to terms with any date gaps & duplicate > years which my approach glosses over. Is that right? > > > Eric > > > > >There are two issues here: what to calculate and > >how to do it. Eric's example presumes two > >estimates for each combination of state, county, year > >and wanting to find the difference between them. > >Evidently this could arise, but on the face of it > >I would guess rather at > > > >bysort state county (year) : gen diff = emp - emp[_n-1] > > > >i.e. the difference between each year and the previous. > > > >A more robust approach would be to -tsset- > > > >egen countyid = group(state county), label > >tsset countyid year > >gen diff = D.emp > > > >Nick > >n.j.cox@durham.ac.uk > > > >Eric G. Wruck > > > >> You were close but your generate (gen) statement wasn't > quite right. > >> > >> > >> . bysort year state county: gen employdiff = employment - > >> employment[_n - 1] > >> (2 missing values generated) > >> > >> . l, noobs > >> > >> +---------------------------------------------+ > >> | year state county employ~ employ~f | > >> |---------------------------------------------| > >> | 1 1 1 10 . | > >> | 1 1 1 15 5 | > >> | 2 2 1 20 . | > >> | 2 2 1 30 10 | > >> +---------------------------------------------+ > > > >> >My data is structured as follows > >> > > >> >year state county employment > >> >1 1 1 10 > >> >2 1 1 20 > >> >1 2 1 15 > >> >2 2 1 30 > >> >... > >> >for 6 years, 50 states, and some counties in each state. I > >> have 1.5 million observations. > >> > > >> >I want to construct a variable that is the difference in > >> employment by year in each state and county. > >> > > >> >I tried > >> > > >> >by year state county, sort: gen newvar = > > > employment-employment[_n-1] but that didn't work. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

