Re: st: binary indicator for differing subsets of variables [SEC=UNCLASSIFIED]

Wed, 7 Sep 2011 07:38:31 +0100

That's interestingly tricky. Here's one way to do it. Let's first initialise a variable gen myindicator = 0 Let's get the (string) suffixes 0806-0110 spelled out one by one to work with unab LFS : LFS* local LFS : subinstr local LFS "LFS" "", all So we want to add in each LFS variable if and only if any of the -date?- variables gives the corresponding date: qui foreach v of local LFS { replace myindicator = myindicator + LFS`v' if inlist("`v'", date1, date2, date3, date4, date5, date6) } replace myindicator = myindicator == 6 You could try looping over the -date?- instead, but I think the above should work. See on the -inlist()- trick http://www.stata.com/statalist/archive/2011-04/msg00618.html and more generally SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise (help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox Q1/09 SJ 9(1):137--157 shows how to exploit functions, egen functions, and Mata for working rowwise; rowsort and rowranks are introduced for a survey of working row-wise. The best single line of advice, however, is to -reshape- panel data like yours to long, as most things are easier that way. Nick On Wed, Sep 7, 2011 at 5:04 AM, Fry, Jane <Jane.Fry@pc.gov.au> wrote: > I'm a bit new to data manipulation using Stata and I have a query: I'd like to set up an indicator variable based on the sum of the values in a selection of other variables. > > So, in my dataset I have variables on individual characteristics (like birth month and year) and a series of binary variables on labour force status (in/out) for consecutive months and years from Aug 2006 - Jan 2010: > LFS0806 LFS0906 LFS1006 ... LFS1109 LFS1209 LFS0110. > > I would like to create a binary indicator variable to show whether or not an individual is in the labour force for 6 consecutive months -- > e.g. LFS0107, ... , LFS0607=1. > The tricky bit is that the 6 month window for each individual ends in the month when they turn 25 -- i.e. the window shifts according to birthday. > > I have set up an 'initial date' identifier variable (date1) that tells me when to begin the window and a 'final date' identifier variable (date2) that tells me when to end the window. So date1 and date2 are string variables of the form "MMYY". > > e.g. for the first observation, date1="0107" and date2="0607", so LFS0107 ... LFS0607 are relevant here. > for the next observation, date1="0906" and date2="0307", so LFS0906 ... LFS0307 are relevant here. > > I think what I need to do is generate a new variable X=. and then replace its values (for each individual) with a 1 or 0 if the sum of the relevant LFS variables is 6. > i.e. the sum of LFSMMYY to LFS(MM+6)YY = 6 (or each LFS is 1). > > Trouble is, I don't know how to do it. I thought something like an egen X = rowtotal("LFS"+date1 - "LFS"+date2) might work but I was wrong! Is there anyone who can help? > > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

