Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: binary indicator for differing subsets of variables [SEC=UNCLASSIFIED]

From   Nick Cox <>
Subject   Re: st: binary indicator for differing subsets of variables [SEC=UNCLASSIFIED]
Date   Wed, 7 Sep 2011 07:38:31 +0100

That's interestingly tricky. Here's one way to do it. Let's first
initialise a variable

gen myindicator = 0

Let's get the (string) suffixes 0806-0110 spelled out one by one to work with

unab LFS : LFS*
local LFS : subinstr local LFS "LFS" "", all

So we want to add in each LFS variable if and only if any of the
-date?- variables gives the corresponding date:

qui foreach v of local LFS {
     replace myindicator = myindicator + LFS`v' if  inlist("`v'",
date1, date2, date3, date4, date5, date6)

replace myindicator = myindicator == 6

You could try looping over the -date?- instead, but I think the above
should work.

See on the -inlist()- trick

and more generally

SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata: Rowwise
        (help rowsort, rowranks if installed) . . . . . . . . . . .  N. J. Cox
        Q1/09   SJ 9(1):137--157
        shows how to exploit functions, egen functions, and Mata
        for working rowwise; rowsort and rowranks are introduced

for a survey of working row-wise. The best single line of advice,
however, is to -reshape- panel data like yours to long, as most things
are easier that way.


On Wed, Sep 7, 2011 at 5:04 AM, Fry, Jane <> wrote:

> I'm a bit new to data manipulation using Stata and I have a query: I'd like to set up an indicator variable based on the sum of the values in a selection of other variables.
> So, in my dataset I have variables on individual characteristics (like birth month and year) and a series of binary variables on labour force status (in/out) for consecutive months and years from Aug 2006 - Jan 2010:
> LFS0806 LFS0906 LFS1006 ... LFS1109 LFS1209 LFS0110.
> I would like to create a binary indicator variable to show whether or not an individual is in the labour force for 6 consecutive months --
> e.g. LFS0107, ... , LFS0607=1.
> The tricky bit is that the 6 month window for each individual ends in the month when they turn 25 -- i.e. the window shifts according to birthday.
> I have set up an 'initial date' identifier variable (date1) that tells me when to begin the window and a 'final date' identifier variable (date2) that tells me when to end the window. So date1 and date2 are string variables of the form "MMYY".
> e.g. for the first observation, date1="0107" and date2="0607", so LFS0107 ... LFS0607 are relevant here.
>        for the next observation, date1="0906" and date2="0307", so LFS0906 ... LFS0307 are relevant here.
> I think what I need to do is generate a new variable X=.  and then replace its values (for each individual) with a 1 or 0 if the sum of the relevant LFS variables is 6.
> i.e. the sum of LFSMMYY to LFS(MM+6)YY = 6 (or each LFS is 1).
> Trouble is, I don't know how to do it. I thought something like an egen X = rowtotal("LFS"+date1 - "LFS"+date2) might work but I was wrong! Is there anyone who can help?

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index