Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Evaluating a set of conditions

From	[email protected] (William Gould, StataCorp LP)
To	[email protected]
Subject	Re: st: Evaluating a set of conditions
Date	Thu, 17 Jun 2010 09:31:23 -0500

Thomas Speidel <[email protected]> is looking for "more efficient or
elegant alternatives to evaluate a set of conditions in Stata".
Thomas offers as a generic example

>     . do something if A==1 & (B>2 & B<.) & (C==1 | D!=2)

As Thomas notes, "these expression can become lengthy".  Thomas asks
about the -cond()- and -inrange()- functions, and others on the list
have already addressed those issues.  I want to suggest another 
approach for use in addition to more clever ways to specify the 
if condition itself.

It is not just that I sometimes have lengthy if conditions that need
to appear on the end of a command, but that the condition or
variations on it need to be specified in a sequence of related
commands.  Using's Thomas's generic notation,

       . <cmd1> if A==1 & (B>2 & B<.) & (C==1 | D!=2)
       . <cmd2> if A==1 & (B>2 & B<.) & (C==1 | D!=2)
       . <cmd3> if        (B>2 & B<.)  
       . <cmd4> if                      (C==1 | D!=2)
       . <cmd5> if A==1 & (B>2 & B<.) & (C==1 | D!=2) & income<5000
       . ...

The chances are small that I will correctliy specify all the
conditions correctly.  In such cases, I create boolean variables that
make it easier to refer to subgroups.  For instance,

       . gen byte is_bgt2 = (B>2 & B<.)

       . gen byte is_ingroup = (C==1 | D!=2)

       . gen byte is_insub = A==1 & is_bgt2 & is_ingroup 

The above commands create new variables containing 1 if the condition is 
true and 0 otherwise.  With those boolean variables, I can now type 

       . <cmd1> if is_insub
       . <cmd2> if is insub 
       . <cmd3> if is_bgt2
       . <cmd4> if is_ingroup
       . <cmd5> if is_insub & income<5000
       . ...

I named my boolean variables starting with "is_"; in real life, I do
the same except that I omit the underscore.  Anyway, I name the
variables -is*- even when the name is inelegant because I find that
later I will make fewer errors using them.  I know an -is*- variable
is a boolean variable.  I name my boolean variables in terms
meaningful to me stated in substanative, conceptual terms, not
technical terms.

My variables might be -ismale- (and that would contain 1 if known to
be male), -isfemale- (known to be female); -isofinterest- (female,
under 30, has children in the home); -isworking- (known to have a job
paying $4/hour or more and hours worked last week known to be in
excess of 5); ismadeit (experimental who survived the operation by 24
hours); and so on.

With those variables, I can more easily specify the subsample of the
population I want,

       . gen myvar = cond(isworking, ..., ...) if isofinterest

       . replace newvar = ... if !isofinterest

Each of my boolean variables have been constructed carefully
considering how I want to treat missing values.  In the above example,
I refer to isofinterest and !isofinterest, but it is not uncommon for
me to have variables isofinterest and isnotofinterest when there are
observations in the data that are neither isofinterest or
isnotofinterest.

Putting aside missing values, an equally important advantage of
creating and using boolean variables is that poor thinking is more
easly spotted when conditions are stated in terms of substantive
concepts rather than the messy details.

-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Highlighting text in the Stata do-file editor
Next by Date: Re: st: Identifying unique values with codebook
Previous by thread: st: Re: Evaluating a set of conditions
Next by thread: st: Evaluating a set of conditions
Index(es):
- Date
- Thread