Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to generate lags where each variable to be lagged has multiple values in the previous time periods

From   Sergiy Radyakin <>
To   "" <>
Subject   Re: st: How to generate lags where each variable to be lagged has multiple values in the previous time periods
Date   Mon, 29 Apr 2013 15:39:33 -0400

Or explicitly the following code should achieve the same (most of the
code is data generation though). Best, Sergiy Radyakin.

program drop _all
program define generate_example_data
 generate SchoolID=.
 generate Year=.
 generate Grade=.
 generate Score=.
 forval y=2000/2013 {
   forval g=1/12 {
  forval sch=1000/1002 {
    quietly {
   set obs `=_N+1'
   replace Year=`y' in L
   replace SchoolID=`sch' in L
   replace Grade=`g' in L
   replace Score=runiform()*100 in L

 format Score %6.2f

 sort SchoolID Year Grade
program define smartlag, sortpreserve
  version 9.0

  syntax varlist, over(varlist) [stub(string) by(varlist)]

  if (`"`stub'"'=="") local stub="lag_"
  isid `over' `by'

  tempfile laggeddata


  foreach v in `over' {
    quietly replace `v'=`v'+1

  foreach v in `varlist' {
    rename `v' `stub'`v'

  sort `over' `by'

  save `"`laggeddata'"'


  sort `over' `by'

  merge `over' `by' using `"`laggeddata'"', nokeep
  drop _merge

list, sepby(SchoolID)
smartlag Score, stub(lag_) by(SchoolID) over(Year Grade)
sort SchoolID Year Grade
list SchoolID Year Grade Score lag_Score, sepby(SchoolID)

On Mon, Apr 29, 2013 at 3:32 PM, Nick Cox <> wrote:
> Focus on any cohort, say the cohort that was grade 8 in 2011, grade 7
> in 2010 and so forth. Evidently, the difference (year - grade) is
> constant, and therefore an identifier, for that cohort. Thus after
> gen id = year - grade
> either
> tsset id year
> or
> tsset id grade
> defines a panel dataset with an identifier and a time variable and
> time series operators can then be applied.
> Nick
> On 29 April 2013 19:46, Stuart Buck <> wrote:
>> Passage rates for all Texas schools for 2008, 2009, 2010, and 2011 --
>> this is important -- by grade. So each row in the dataset is School,
>> Year, Grade, and then scores (plus other demographic variables, etc.).
>> In other words, the dataset looks like this:
>> Year     SchoolID     Grade     TestScore
>> 2011    1                  6               ***
>> 2011     1                 7               ***
>> 2011     1                 8               ***
>> And so on and so forth -- multiple grades in each school in each year.
>> Here's what I want:
>> To be able to regress any given school's performance in Grade X in
>> Year T on, among other things, how that same school did with the same
>> cohort of kids in the previous grade (Grade X-1) in the previous year
>> (Year T-1). I.e., if a middle school's Grade 8 passage rate in 2011 is
>> the outcome, I'd like to be able to control for that same school's
>> Grade 7 passage rate in 2010, thus giving a somewhat crude measure of
>> how much that group of kids progressed since the previous year.
>> How would I generate an all-purpose lagged TestScore variable for all
>> the schools in the dataset, lagging by both year and grade at once?
>> All the Stata instructional material I see on lagged variables just
>> lags based on time, not on both time and some other variable too
>> (grade).
> *
> *   For searches and help try:
> *
> *
> *
*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index