Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
wgould@stata.com (William Gould, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Evaluating a set of conditions |

Date |
Thu, 17 Jun 2010 09:31:23 -0500 |

Thomas Speidel <thomas@tmbx.com> is looking for "more efficient or elegant alternatives to evaluate a set of conditions in Stata". Thomas offers as a generic example > . do something if A==1 & (B>2 & B<.) & (C==1 | D!=2) As Thomas notes, "these expression can become lengthy". Thomas asks about the -cond()- and -inrange()- functions, and others on the list have already addressed those issues. I want to suggest another approach for use in addition to more clever ways to specify the if condition itself. It is not just that I sometimes have lengthy if conditions that need to appear on the end of a command, but that the condition or variations on it need to be specified in a sequence of related commands. Using's Thomas's generic notation, . <cmd1> if A==1 & (B>2 & B<.) & (C==1 | D!=2) . <cmd2> if A==1 & (B>2 & B<.) & (C==1 | D!=2) . <cmd3> if (B>2 & B<.) . <cmd4> if (C==1 | D!=2) . <cmd5> if A==1 & (B>2 & B<.) & (C==1 | D!=2) & income<5000 . ... The chances are small that I will correctliy specify all the conditions correctly. In such cases, I create boolean variables that make it easier to refer to subgroups. For instance, . gen byte is_bgt2 = (B>2 & B<.) . gen byte is_ingroup = (C==1 | D!=2) . gen byte is_insub = A==1 & is_bgt2 & is_ingroup The above commands create new variables containing 1 if the condition is true and 0 otherwise. With those boolean variables, I can now type . <cmd1> if is_insub . <cmd2> if is insub . <cmd3> if is_bgt2 . <cmd4> if is_ingroup . <cmd5> if is_insub & income<5000 . ... I named my boolean variables starting with "is_"; in real life, I do the same except that I omit the underscore. Anyway, I name the variables -is*- even when the name is inelegant because I find that later I will make fewer errors using them. I know an -is*- variable is a boolean variable. I name my boolean variables in terms meaningful to me stated in substanative, conceptual terms, not technical terms. My variables might be -ismale- (and that would contain 1 if known to be male), -isfemale- (known to be female); -isofinterest- (female, under 30, has children in the home); -isworking- (known to have a job paying $4/hour or more and hours worked last week known to be in excess of 5); ismadeit (experimental who survived the operation by 24 hours); and so on. With those variables, I can more easily specify the subsample of the population I want, . gen myvar = cond(isworking, ..., ...) if isofinterest . replace newvar = ... if !isofinterest Each of my boolean variables have been constructed carefully considering how I want to treat missing values. In the above example, I refer to isofinterest and !isofinterest, but it is not uncommon for me to have variables isofinterest and isnotofinterest when there are observations in the data that are neither isofinterest or isnotofinterest. Putting aside missing values, an equally important advantage of creating and using boolean variables is that poor thinking is more easly spotted when conditions are stated in terms of substantive concepts rather than the messy details. -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Highlighting text in the Stata do-file editor** - Next by Date:
**Re: st: Identifying unique values with codebook** - Previous by thread:
**st: Re: Evaluating a set of conditions** - Next by thread:
**st: Evaluating a set of conditions** - Index(es):