Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Lacy,Michael" <Michael.Lacy@colostate.edu> |
To | "statalist-digest@hsphsun2.harvard.edu" <statalist-digest@hsphsun2.harvard.edu> |
Subject | Re: st: Matching new and old id's in multiple years to create a panel |
Date | Tue, 14 Aug 2012 19:17:18 +0000 |
>From Kirk Geale <kirkgeale@gmail.com> >To statalist@hsphsun2.harvard.edu >Subject st: Matching new and old id's in multiple years to create a panel >Date Sat, 11 Aug 2012 23:57:03 -0400 > >Hello, > >I have separate files containing a year of data where each respondent >has a unique id. In subsequent years, the id number for the same >person is different. However, in a given year there is an id variable >that matches the previous year's id, called id2. As an example, the >same respondent could have an id code 14503 in the year 2000, but >94837 in the year 2001. In 2001, there is a second variable id2 that >records 14503, which is the respondent's id in 2000. If the >respondent was not surveyed in the previous year, id2 is coded as 0. >This occurs in all years. I want to match these individuals from year >to year by generating a new variable that identifies them as the same >person (where appropriate), for the end goal of survival analysis. I >initially thought this would be very easy to do, but I can't seem to >get it. Thanks for any ideas! > >Kirk Geale > Is this like what you had in mind? //Example Data: Some individuals enter the panel every year, but each // one has a record for every year after the entry year. I'm presuming // that years are consecutive integers. // Note that for regularity, I made "previous id" = 0 even for year 1. // clear input year prev_id curr_id // prev_id is more convenient than id2 1 0 132 2 132 976 3 976 804 4 804 271 1 0 765 2 765 317 3 317 887 4 887 302 2 0 387 3 387 701 4 701 523 2 0 654 3 654 124 4 124 972 3 0 298 4 298 321 4 0 820 end compress // Put data into separate files to fit what Kirk already has. // I'll also do something he probably doesn't have yet, // which is to name each file with year as a suffix, and // to rename prev_id and curr_id according to the actual year. levelsof year, local(ylist) qui summ year local lastyear = r(max) // I'll need this value local firstyear = r(min) // quiet { foreach y of local ylist { preserve tempfile file`y' keep if (year == `y') local ym1 = `y' -1 rename curr_id id`y' rename prev_id id`ym1' save `file`y'' di "`file restore } } // Now, we have the slightly massaged version of Kirk's data. // The rest is a series of merges, linking ids year by year. // foreach y of local ylist { if (`y' < `lastyear') { local next = `y' + 1 qui merge 1:m id`y' using "`file`next''" // the 1:m, not 1:1 matters drop _merge } } // Clean up and put the data into person-year format drop year id0 qui recode id* (0=.) gen entry_id = _n reshape long id, i(entry_id) j(year) drop if missing(id) bysort entry_id (year) : replace entry_id = id[1] Regards, Mike Lacy Dept. of Sociology Colorado State University Fort Collins CO 80523-1784 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/