Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: differencing


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: differencing
Date   Mon, 14 Nov 2005 16:44:01 -0000

Indeed. Your example is well behaved, but 
as we all know, real data need not be. Also, 
it is easy to get confused on the details 
of the -by:-, so that for example 

bysort state county : 

does not _guarantee_ that -year- is in the 
right order within -state- and -county-. 
Once -tsset-, neither pitfall should catch you. 

Nick 
n.j.cox@durham.ac.uk 

Eric G. Wruck
 
> Yeah, I goofed.  For one thing, I entered the data 
> incorrectly.  I was trying to follow what Gregor said he 
> wanted, which I'm not sure I understood or that he wrote down 
> clearly.  I fully acknowledge that using the D. operator 
> --which you & Kit suggested-- is probably the way to go. 
> 
> Nevertheless, I want to try to correct what I did earlier.  I 
> added a third observation for one of the state county 
> combinations.  I am assuming that Gregor wants a difference 
> in employment from one year to the next within state & 
> county.  So here goes:
> 
> . sort state county year
> 
> . l
> 
>      +----------------------------------+
>      | year   state   county   employ~t |
>      |----------------------------------|
>   1. |    1       1        1         10 |
>   2. |    2       1        1         20 |
>   3. |    3       1        1         22 |
>   4. |    1       2        1         15 |
>   5. |    2       2        1         30 |
>      +----------------------------------+
> 
> . bysort state county: gen diff = employment - employment[_n - 1]
> (2 missing values generated)
> 
> . l
> 
>      +-----------------------------------------+
>      | year   state   county   employ~t   diff |
>      |-----------------------------------------|
>   1. |    1       1        1         10      . |
>   2. |    2       1        1         20     10 |
>   3. |    3       1        1         22      2 |
>   4. |    1       2        1         15      . |
>   5. |    2       2        1         30     15 |
>      +-----------------------------------------+
>  
> 
> 
> If I understand the tsset stuff at all, that approach would 
> force Gregor to come to terms with any date gaps & duplicate 
> years which my approach glosses over.  Is that right?
> 
> 
> Eric
> 
> 
> 
> >There are two issues here: what to calculate and
> >how to do it. Eric's example presumes two
> >estimates for each combination of state, county, year
> >and wanting to find the difference between them.
> >Evidently this could arise, but on the face of it
> >I would guess rather at
> >
> >bysort state county (year) : gen diff = emp - emp[_n-1]
> >
> >i.e. the difference between each year and the previous.
> >
> >A more robust approach would be to -tsset-
> >
> >egen countyid = group(state county), label
> >tsset countyid year
> >gen diff = D.emp
> >
> >Nick
> >n.j.cox@durham.ac.uk
> >
> >Eric G. Wruck
> > 
> >> You were close but your generate (gen) statement wasn't 
> quite right.
> >>
> >>
> >> . bysort year state county: gen employdiff = employment -
> >> employment[_n - 1]
> >> (2 missing values generated)
> >>
> >> . l, noobs
> >>
> >>   +---------------------------------------------+
> >>   | year   state   county   employ~    employ~f |
> >>   |---------------------------------------------|
> >>   |    1       1        1         10          . |
> >>   |    1       1        1         15          5 |
> >>   |    2       2        1         20          . |
> >>   |    2       2        1         30         10 |
> >>   +---------------------------------------------+
> > 
> >> >My data is structured as follows
> >> >
> >> >year    state   county   employment
> >> >1            1         1            10
> >> >2            1         1            20
> >> >1            2         1            15
> >> >2            2         1            30
> >> >...
> >> >for 6 years, 50 states, and some counties in each state. I
> >> have 1.5 million observations.
> >> >
> >> >I want to construct a variable that is the difference in
> >> employment by year in each state and county.
> >> >
> >> >I tried
> >> >
> >> >by year state county, sort: gen newvar =
> > > employment-employment[_n-1]  but that didn't work.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index