Jane Fry

statalist@hsphsun2.harvard.edu

st: binary indicator for differing subsets of variables [SEC=UNCLASSIFIED]

Wed, 7 Sep 2011 14:04:50 +1000

Hi, I'm a bit new to data manipulation using Stata and I have a query: I'd like to set up an indicator variable based on the sum of the values in a selection of other variables. So, in my dataset I have variables on individual characteristics (like birth month and year) and a series of binary variables on labour force status (in/out) for consecutive months and years from Aug 2006 - Jan 2010: LFS0806 LFS0906 LFS1006 ... LFS1109 LFS1209 LFS0110. I would like to create a binary indicator variable to show whether or not an individual is in the labour force for 6 consecutive months -- e.g. LFS0107, ... , LFS0607=1. The tricky bit is that the 6 month window for each individual ends in the month when they turn 25 -- i.e. the window shifts according to birthday. I have set up an 'initial date' identifier variable (date1) that tells me when to begin the window and a 'final date' identifier variable (date2) that tells me when to end the window. So date1 and date2 are string variables of the form "MMYY". e.g. for the first observation, date1="0107" and date2="0607", so LFS0107 ... LFS0607 are relevant here. for the next observation, date1="0906" and date2="0307", so LFS0906 ... LFS0307 are relevant here. I think what I need to do is generate a new variable X=. and then replace its values (for each individual) with a 1 or 0 if the sum of the relevant LFS variables is 6. i.e. the sum of LFSMMYY to LFS(MM+6)YY = 6 (or each LFS is 1). Trouble is, I don't know how to do it. I thought something like an egen X = rowtotal("LFS"+date1 - "LFS"+date2) might work but I was wrong! Is there anyone who can help? Many thanks, Jane.

