Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: binary indicator for differing subsets of variables [SEC=UNCLASSIFIED]

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: binary indicator for differing subsets of variables [SEC=UNCLASSIFIED] Date Wed, 7 Sep 2011 07:38:31 +0100

```That's interestingly tricky. Here's one way to do it. Let's first
initialise a variable

gen myindicator = 0

Let's get the (string) suffixes 0806-0110 spelled out one by one to work with

unab LFS : LFS*
local LFS : subinstr local LFS "LFS" "", all

So we want to add in each LFS variable if and only if any of the
-date?- variables gives the corresponding date:

qui foreach v of local LFS {
replace myindicator = myindicator + LFS`v' if  inlist("`v'",
date1, date2, date3, date4, date5, date6)
}

replace myindicator = myindicator == 6

You could try looping over the -date?- instead, but I think the above
should work.

See on the -inlist()- trick

http://www.stata.com/statalist/archive/2011-04/msg00618.html

and more generally

SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata: Rowwise
(help rowsort, rowranks if installed) . . . . . . . . . . .  N. J. Cox
Q1/09   SJ 9(1):137--157
shows how to exploit functions, egen functions, and Mata
for working rowwise; rowsort and rowranks are introduced

for a survey of working row-wise. The best single line of advice,
however, is to -reshape- panel data like yours to long, as most things
are easier that way.

Nick

On Wed, Sep 7, 2011 at 5:04 AM, Fry, Jane <Jane.Fry@pc.gov.au> wrote:

> I'm a bit new to data manipulation using Stata and I have a query: I'd like to set up an indicator variable based on the sum of the values in a selection of other variables.
>
> So, in my dataset I have variables on individual characteristics (like birth month and year) and a series of binary variables on labour force status (in/out) for consecutive months and years from Aug 2006 - Jan 2010:
> LFS0806 LFS0906 LFS1006 ... LFS1109 LFS1209 LFS0110.
>
> I would like to create a binary indicator variable to show whether or not an individual is in the labour force for 6 consecutive months --
> e.g. LFS0107, ... , LFS0607=1.
> The tricky bit is that the 6 month window for each individual ends in the month when they turn 25 -- i.e. the window shifts according to birthday.
>
> I have set up an 'initial date' identifier variable (date1) that tells me when to begin the window and a 'final date' identifier variable (date2) that tells me when to end the window. So date1 and date2 are string variables of the form "MMYY".
>
> e.g. for the first observation, date1="0107" and date2="0607", so LFS0107 ... LFS0607 are relevant here.
>        for the next observation, date1="0906" and date2="0307", so LFS0906 ... LFS0307 are relevant here.
>
> I think what I need to do is generate a new variable X=.  and then replace its values (for each individual) with a 1 or 0 if the sum of the relevant LFS variables is 6.
> i.e. the sum of LFSMMYY to LFS(MM+6)YY = 6 (or each LFS is 1).
>
> Trouble is, I don't know how to do it. I thought something like an egen X = rowtotal("LFS"+date1 - "LFS"+date2) might work but I was wrong! Is there anyone who can help?
>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```