Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Evaluating a set of conditions

 From wgould@stata.com (William Gould, StataCorp LP) To statalist@hsphsun2.harvard.edu Subject Re: st: Evaluating a set of conditions Date Thu, 17 Jun 2010 09:31:23 -0500

```Thomas Speidel <thomas@tmbx.com> is looking for "more efficient or
elegant alternatives to evaluate a set of conditions in Stata".
Thomas offers as a generic example

>     . do something if A==1 & (B>2 & B<.) & (C==1 | D!=2)

As Thomas notes, "these expression can become lengthy".  Thomas asks
about the -cond()- and -inrange()- functions, and others on the list
approach for use in addition to more clever ways to specify the
if condition itself.

It is not just that I sometimes have lengthy if conditions that need
to appear on the end of a command, but that the condition or
variations on it need to be specified in a sequence of related
commands.  Using's Thomas's generic notation,

. <cmd1> if A==1 & (B>2 & B<.) & (C==1 | D!=2)
. <cmd2> if A==1 & (B>2 & B<.) & (C==1 | D!=2)
. <cmd3> if        (B>2 & B<.)
. <cmd4> if                      (C==1 | D!=2)
. <cmd5> if A==1 & (B>2 & B<.) & (C==1 | D!=2) & income<5000
. ...

The chances are small that I will correctliy specify all the
conditions correctly.  In such cases, I create boolean variables that
make it easier to refer to subgroups.  For instance,

. gen byte is_bgt2 = (B>2 & B<.)

. gen byte is_ingroup = (C==1 | D!=2)

. gen byte is_insub = A==1 & is_bgt2 & is_ingroup

The above commands create new variables containing 1 if the condition is
true and 0 otherwise.  With those boolean variables, I can now type

. <cmd1> if is_insub
. <cmd2> if is insub
. <cmd3> if is_bgt2
. <cmd4> if is_ingroup
. <cmd5> if is_insub & income<5000
. ...

I named my boolean variables starting with "is_"; in real life, I do
the same except that I omit the underscore.  Anyway, I name the
variables -is*- even when the name is inelegant because I find that
later I will make fewer errors using them.  I know an -is*- variable
is a boolean variable.  I name my boolean variables in terms
meaningful to me stated in substanative, conceptual terms, not
technical terms.

My variables might be -ismale- (and that would contain 1 if known to
be male), -isfemale- (known to be female); -isofinterest- (female,
under 30, has children in the home); -isworking- (known to have a job
paying \$4/hour or more and hours worked last week known to be in
excess of 5); ismadeit (experimental who survived the operation by 24
hours); and so on.

With those variables, I can more easily specify the subsample of the
population I want,

. gen myvar = cond(isworking, ..., ...) if isofinterest

. replace newvar = ... if !isofinterest

Each of my boolean variables have been constructed carefully
considering how I want to treat missing values.  In the above example,
I refer to isofinterest and !isofinterest, but it is not uncommon for
me to have variables isofinterest and isnotofinterest when there are
observations in the data that are neither isofinterest or
isnotofinterest.

Putting aside missing values, an equally important advantage of
creating and using boolean variables is that poor thinking is more
easly spotted when conditions are stated in terms of substantive
concepts rather than the messy details.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```