Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Steven Samuels <sjsamuels@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Identify Person in a rotating panel |
Date | Fri, 4 Feb 2011 22:45:20 -0500 |
I doubt that you have to write your own program. Try -reclink- by Michael Blasnik ("findit") or one of Stata's other contributed matching programs (e.g. -cem- -psmatch2-. In many panel studies, the dates would not be _exactly_ six months apart, and a good program will allow for this.
Steve sjsamuels@gmail.com On Feb 4, 2011, at 1:38 PM, Ulrich Brandt wrote: Hello, I have written a litte do-File in order to identify Persons in a Panel-Dataset because unfortunately they have no unique ID. I know that every Person only appears twice with a time-gap of 6 months, so it`s kind of a "rotating panel". In my approach i am trying to identify person by comparing attributes which cant change over time, like race, month and year of birth, sex etc. together with a observation difference of 6 month. The subparts of my code work properly but if i try to run them together the results arent plausible. I am using Stata 10.1, Windows XP (32 Bit). For example i have got one person in the dataset: YYYYMM BIRTHM BIRTHY SEX RACE 198103 February 1895 Male White ex 198109 February 1895 Male White ex February, Male and White ex are only datalabels. The characteristic value behind the label are in this case Male=1, White ex=1, February = 2 YYYYMM BIRTHM BIRTHY SEX RACE 198103 2 1895 1 1 198109 2 1895 1 1 I used some combined foreach loops to generate all combinations of properties and a while loop to generate the different dates with 6 months time-gap. If two obervations have these generated properties the should get an ID called "test" by using an upcounting local macro called "i". Here is the code. The example posted generates just the time period between 1980-1982. ------------------------------------------------------------------------ --------------------- gen test=. local i = 1 levelsof RACE , local (value_race) levelsof SEX , local (value_sex) levelsof BIRTHY , local (value_birthy) levelsof BIRTHM , local (value_birthm) foreach race of local value_race{ local race = `race' foreach sex of local value_sex{ local sex = `sex' foreach birthy of local value_birthy{ local birthy = `birthy' foreach birthm of local value_birthm{ local birthm = `birthm' local year 1980 while (`year'< 1982) { forvalues month = 1/12{ if (`month' <=3) { local datefirst "`year'0`month'" local datelast "`year'0`month'+6" } if (`month' >=4 & `month'<=6) { local datefirst "`year'0`month'" local datelast "`year'0`month'+6" } if (`month' >=7 & `month' <=9){ local datefirst "`year'0`month'" local datehelp = `year'+1 local datelast "`datehelp'0`month'-6" } if (`month' >=10 & `month' <=12){ local datefirst "`year'`month'" local datehelp = `year'+1 local datelast "`datehelp'`month'-6" } if (`month' ==12){ local year = (`year')+1 } replace test=`i++' if (RACE == `race' & SEX == `sex' & BIRTHY == `birthy' & BIRTHM == `birthm') & (YYYYMM == (`datefirst') | YYYYMM == (`datelast')) } } } } } } The code generates combinations like 1118828--198111--198205 (for example white ex-male-1882-August-198111-198205) 1118828--198112--198206 1118829--198001--198007 1118829--198002--198008 And so on I have got two problems, when i use this code with my data. I suppose that the macro "i" counts up everytime the loops passes through even if there is no obervation with this combination because the output generates very high ids numbers. But i want that it only counts up when a combination like this exists. Secondly i know that there is a logical error in the "replace"- line at the end. But i didnt found my fault. I want that every two observations from the same person gets the same ID. But with this code the macro counts up and generates an id for 1118828--198111 or 1118828--198205 or 1118828--198111--198205. My goal is that it only generates an ID for 1118828--198111--198205. In other words it should only count up and generate when two observations with the same properties exist. I hope that someone has some suggestions how to fix my problem. I hope the way i posted everything is right and understandable, if not please correct me. It`s my first time posting on stata-list. Best regards Ulrich Brandt * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/