Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Panel earnings data and coding of time varying indeps


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Panel earnings data and coding of time varying indeps
Date   Thu, 19 Jan 2006 23:59:46 -0000

I guess what is meant here is 

bysort id (time) : gen MOREED2 = sum(MOREED) 

Nick 
n.j.cox@durham.ac.uk 

David Bell
 
> An additional wrinkle, if you conceptualize education as cumulative,  
> is to replace
> 
> replace MOREED2 = max(MOREED, L.MOREED2)
> 
> with
> 
> replace MOREED2 = sum(MOREED, L.MOREED2)

n j cox wrote:
 
> > This calculation is canned as the -egen- function
> > -record()-. A record, recall, is the maximum or
> > minimum seen so far. Here is the whole Kit and
> > caboodle, modulo line wraps:
> >
> > . ssc type _grecord.ado
> > *! 1.2.1 CFB/NJC 8 Oct 2001
> > * 1.2.0 CFB/NJC 8 Oct 2001
> > * 1.1.0 CFB 06 Oct 2001
> > program define _grecord
> >         version 6.0
> >         syntax newvarname =/exp [if] [in] [, BY(varlist) ORDER 
> > (varlist) MIN ]
> >         tempvar touse obsno
> >         local op = cond("`min'" == "min", "min", "max")
> >         quietly {
> >                 mark `touse' `if' `in'
> >                 gen `typlist' `varlist' = `exp' if `touse'
> >                 gen long `obsno' = _n
> >                 sort `touse' `by' `order' `obsno'
> >                 by `touse' `by': /*
> >         */ replace `varlist' = `op'(`varlist',`varlist'[_n-1]) if  
> > `touse'
> >         }
> > end
> >
> > In Christer's case, and in others, you don't need the
> > canned version, although it may come in useful.
> >
> > Assuming a -tsset- panel, you can do it with
> >
> > gen MOREED2 = 0
> > replace MOREED2 = max(MOREED, L.MOREED2)
> >
> > Let's see how this works with
> >
> > 1 0
> > 2 0
> > 3 0
> > 4 1
> > 5 1
> > 6 0
> > 7 0
> > 8 0
> > 9 0
> > 10 0
> >
> > First off, using L. means that the -replace-
> > is done within panels.
> >
> > Second, recall that -replace- uses the sort
> > order of the data. In the first obs, the
> > new value is max(MOREED[1], L.MOREED2[1]).
> > There is no obs before the first, so L.MOREED2[1]
> > evaluates to missing, but this problem is no problem
> > as max(non-missing, missing) is always non-missing.
> > So MOREED2[1] is now 0.
> >
> > In the second obs, we get max(0,0) which again
> > is 0.
> >
> > Same story until the fourth obs, in which we
> > get max(1, 0) which is 1. In the next obs
> > we get max(1, 1) which is 1. But
> > in the next we get max(0,1) which is 1.
> >
> > In fact, once the MOREED2 being -replace-d first hits a value of 1
> > it sticks there, which is what was asked for.
> >
> > Note that the initialisation of MOREED2 is arbitrary.
> > We just need it to exist before it can be -replaced-.
> >
> > Or, not quite. More concisely, you can do it with
> >
> > gen MOREED2 = max(L.MOREED2, MOREED)
> >
> > This may well look bizarre to you, as the RHS refers
> > to a variable that doesn't yet exist, but I have it
> > on good authority that in this case this metaphysical-
> > theological difficulty "non tenet aquam", as Aquinas
> > probably said.
> >
> > At least, this works in Stata 8, which is what
> > I have access to right now.
> >
> > Nick
> > n.j.cox@durham.ac.uk
> >
> >
> > Christer Thrane
> >
> > In my yearly panel data I follow a number of college 
> undergraduates  
> > for
> > about ten years after they graduate. Each year after graduation  
> > their yearly earnings are recorded along with a binary variable  
> > (called MOREED) that indicates whether or not they were 
> enrolled in  
> > "above-undergraduate" education in that year (1 = yes, 0 = no).  
> > Thus, if a person during the time period first works full-time for  
> > three years after graduation before going to the university 
> for two  
> > more years and then return to working full-time, he or she will  
> > have the following panel sequence for the variable MOREED:
> >
> > 0 (year 1), 0 (year 2), 0 (year 3), 1 (year 4) 1 (year 5), 
> 0, 0, 0,  
> > 0, and 0
> > (year 6, 7, 8, 9, and 10).
> >
> > When I, using -xtreg, re-, model log earnings as function 
> of MOREED  
> > and
> > controls, the former coefficient is large, negative and  
> > significant. I guess this makes sense--in the same year as 
> a person  
> > goes to school, he or she does not have (enough) time to work full- 
> > time.
> >
> > On the other hand, common sense suggests that increasing ones  
> > knowledge (by taking up more education) should "pay off" in the  
> > longer run. In other words, MOREED should be positive. In this  
> > respect, I'm wondering if maybe the problem is the coding of the  
> > variable MOREED. From year 5 to 6, this variable changes its value  
> > back from 1 to 0. And although this "makes sense" in the data, it  
> > does, in my opinion, not make sense in reality--you cannot "go  
> > back" from, say, being a graduate to being an undergraduate. My  
> > question therefore becomes:
> >
> > How can I change the variable MOREED so that it keeps the value of  
> > 1 in the remaining panels after it gets a 1 for the first time.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index