Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Joerg Luedicke <joerg.luedicke@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: RE: st: Merging 2 Tricky Panel Datasets |
Date | Mon, 14 Mar 2011 21:13:58 -0400 |
On Mon, Mar 14, 2011 at 5:29 PM, Clifton Chow <clifton_chow@post.harvard.edu> wrote: > > A. Interview date - This is matched identically on both datasets, but the format for dataset 1 = mo/day/year and for dataset 2 = month, day and year are broken out into separate variables. > > dataset 1 dataset 2 > > obs 1 04 12 09 obs 1 04/12/2009 > obs 2 12 14 10 obs 2 12/14/2010 > B. Interview sequence: This is the tricky part. Dataset 1 has a variable denoting interview sequence from 1- 9, but dataset 2 has interview sequence variable from 1 - 10, with 10 being the final interview conducted before discharge that can map on to the final interview recorded in dataset 1. > > Dataset 1 Dataset 2 > > ID Seq ID Seq > obs 1 1 obs 1 1 > obs 1 2 obs 1 2 > obs 2 1 obs 2 1 > obs 2 2 obs 2 2 > OBS 2 3 OBS2 10 > > This means for individuals from dataset 2 without a sequence number 10, everything lines up perfectly between the two datasets (1-9). But for those with a sequence number 10, it can map on to any possible datapoint in dataset 1, depending on which is the individual's final interview as recorded in dataset 1. > > Does anyone have a program (either forloop or if statement) that can handle datapoint 10 from dataset 2 so I can still successfully merge both datasets without losing significant data from individuals who were discharged (those with datapoint 10)? RE A, type -help date- for how Stata deals with dates and times and how you can convert from numeric into dates and vice versa. For instance you could change the date from your dataset2 into 3 variables as in dataset 1 and then merge accordingly. RE B, this is probably easier if I understand your problem correctly. In dataset1, you can simply replace the last observation in the sequence with 10 or replace the 10 in dataset2 with the previous number in the sequence plus 1. For the first you could write something like: gen seq2=Seq sort ID seq bys ID: replace seq2=10 if _n==_N For the second option it could be: gen seq3=Seq sort ID seq bys ID: replace seq3=[_n-1]+1 if _n==_N hth, J. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/