Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Panel earnings data and coding of time varying indeps

From   David Bell <>
Subject   Re: st: Panel earnings data and coding of time varying indeps
Date   Thu, 19 Jan 2006 16:09:30 -0600

An additional wrinkle, if you conceptualize education as cumulative, is to replace

replace MOREED2 = max(MOREED, L.MOREED2)


replace MOREED2 = sum(MOREED, L.MOREED2)

Dave Bell

On Jan 19, 2006, at 3:51 PM, n j cox wrote:

This calculation is canned as the -egen- function
-record()-. A record, recall, is the maximum or
minimum seen so far. Here is the whole Kit and
caboodle, modulo line wraps:

. ssc type _grecord.ado
*! 1.2.1 CFB/NJC 8 Oct 2001
* 1.2.0 CFB/NJC 8 Oct 2001
* 1.1.0 CFB 06 Oct 2001
program define _grecord
version 6.0
syntax newvarname =/exp [if] [in] [, BY(varlist) ORDER (varlist) MIN ]
tempvar touse obsno
local op = cond("`min'" == "min", "min", "max")
quietly {
mark `touse' `if' `in'
gen `typlist' `varlist' = `exp' if `touse'
gen long `obsno' = _n
sort `touse' `by' `order' `obsno'
by `touse' `by': /*
*/ replace `varlist' = `op'(`varlist',`varlist'[_n-1]) if `touse'

In Christer's case, and in others, you don't need the
canned version, although it may come in useful.

Assuming a -tsset- panel, you can do it with

gen MOREED2 = 0
replace MOREED2 = max(MOREED, L.MOREED2)

Let's see how this works with

1 0
2 0
3 0
4 1
5 1
6 0
7 0
8 0
9 0
10 0

First off, using L. means that the -replace-
is done within panels.

Second, recall that -replace- uses the sort
order of the data. In the first obs, the
new value is max(MOREED[1], L.MOREED2[1]).
There is no obs before the first, so L.MOREED2[1]
evaluates to missing, but this problem is no problem
as max(non-missing, missing) is always non-missing.
So MOREED2[1] is now 0.

In the second obs, we get max(0,0) which again
is 0.

Same story until the fourth obs, in which we
get max(1, 0) which is 1. In the next obs
we get max(1, 1) which is 1. But
in the next we get max(0,1) which is 1.

In fact, once the MOREED2 being -replace-d first hits a value of 1
it sticks there, which is what was asked for.

Note that the initialisation of MOREED2 is arbitrary.
We just need it to exist before it can be -replaced-.

Or, not quite. More concisely, you can do it with


This may well look bizarre to you, as the RHS refers
to a variable that doesn't yet exist, but I have it
on good authority that in this case this metaphysical-
theological difficulty "non tenet aquam", as Aquinas
probably said.

At least, this works in Stata 8, which is what
I have access to right now.


Christer Thrane

In my yearly panel data I follow a number of college undergraduates for
about ten years after they graduate. Each year after graduation their yearly earnings are recorded along with a binary variable (called MOREED) that indicates whether or not they were enrolled in "above-undergraduate" education in that year (1 = yes, 0 = no). Thus, if a person during the time period first works full-time for three years after graduation before going to the university for two more years and then return to working full-time, he or she will have the following panel sequence for the variable MOREED:

0 (year 1), 0 (year 2), 0 (year 3), 1 (year 4) 1 (year 5), 0, 0, 0, 0, and 0
(year 6, 7, 8, 9, and 10).

When I, using -xtreg, re-, model log earnings as function of MOREED and
controls, the former coefficient is large, negative and significant. I guess this makes sense--in the same year as a person goes to school, he or she does not have (enough) time to work full- time.

On the other hand, common sense suggests that increasing ones knowledge (by taking up more education) should "pay off" in the longer run. In other words, MOREED should be positive. In this respect, I'm wondering if maybe the problem is the coding of the variable MOREED. From year 5 to 6, this variable changes its value back from 1 to 0. And although this "makes sense" in the data, it does, in my opinion, not make sense in reality--you cannot "go back" from, say, being a graduate to being an undergraduate. My question therefore becomes:

How can I change the variable MOREED so that it keeps the value of 1 in the remaining panels after it gets a 1 for the first time.

* For searches and help try:
*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index