Title | True and false in Stata | |

Author | Nicholas J. Cox, Durham University, UK |

Most computer languages have some way of indicating and working with what is true and what is false, but not all languages choose exactly the same way. Stata follows two rules, the second of which may be considered as a generalization of the first. I will state the rules, and then we will look at each in turn.

- Rule 1: Logical or Boolean expressions evaluate to 0 if false, 1 if true.
- Rule 2: Logical or Boolean arguments, such as the argument to
**if**or**while**, may take on any value, not just 0 or 1; 0 is treated as false and any other numeric value as true.

First, consider the results of logical or Boolean expressions. (George Boole
worked on logic and probability in the nineteenth century. For more about George
Boole, see
http://www-history.mcs.st-and.ac.uk/~history/Mathematicians/Boole.html.)
In Stata, these expressions use one or more various relational
and logical operators. The operators **==**, **~=**, **!=**,
**>**, **>=**, **<**, and **<=** are used to test
equality or inequality. The operators **& | ~** and **!** are used to
indicate "and", "or", and "not". It is a matter of taste whether you use
**~** or **!** to indicate negation. In this FAQ, we use **!**.
If you want to learn more about any of these, see
operators.

For example, in the auto dataset, the expression **foreign == 1** will be
true for those observations where the variable **foreign** is 1 and
false otherwise. The double equal sign **==** is used whenever you wish
to test for equality; compare the use of the single equal sign **=** for
assignment. As a second example, the expression **2 == 2** is always
true. That may not seem helpful or instructive, but below we will see a use
for expressions that are necessarily always true. More complicated
expressions can readily be constructed: **foreign == 1 & rep78 == 4**
will be true whenever **foreign == 1** and **rep78 == 4**. Typing

. count if foreign == 1 & rep78 == 4

shows that there are nine such cars in the **auto** dataset. (Incidentally, the
**count**
command may seem trivial, yet it is a simple way of getting answers to some
basic questions about your data.)

Logical expressions have numerical values, which can be immensely useful. In Stata, the rule is that false logical expressions have value 0 and true logical expressions have value 1. Thus logical expressions may be used to generate indicator variables (also often called binary, dichotomous, dummy, logical, or Boolean, depending on tribal jargon), which have values 0 or 1. The command

. generate himpg = mpg > 30

will generate a new variable that is 1 whenever **mpg** is greater than
30, and 0 otherwise. Two wrinkles should now be mentioned. What if
**mpg** were missing? The rule is that Stata treats numeric missing
values as higher than any other numeric value, so missing would certainly
qualify as greater than 30, and any observation with **mpg** missing
would be assigned 1 for this new variable. This rule leads to the next
wrinkle: typing

. generate himpg = mpg > 30 if mpg < .

would assign 1 if **mpg** were greater than 30 but not missing; 0 if
**mpg** were not greater than 30; and missing if **mpg** were missing.
The logic is that you did not say what result you wanted if **mpg** were
missing; in the absence of instructions, Stata will shrug its shoulders in
the only way it knows, assigning a result of missing. The same logic would
apply if you were only interested in domestic cars:

. generate himpg = mpg > 30 if foreign == 0

If **foreign** were not equal to 0, then the result would be missing.
Otherwise, the result would be 1 or 0 according to whether **mpg** was or
was not greater than 30.

Numerical value of logical
expressions always proves useful when we want to count something. Suppose we want to
create a new variable in which we will put the frequencies of **mpg**
being greater than 30, by categories of **rep78**:

. sort rep78 . by rep78: generate nhimpg = sum(mpg > 30) . by rep78: replace nhimpg = nhimpg[_N]

In the second statement, the function **sum()** produces a cumulative or
running sum of **mpg > 30**. If **mpg > 30**, 1 is added to the
sum; otherwise, 0 is added. This statement yields a running count of the
number of observations for which **mpg > 30**. In the third statement,
we replace the running count with its last value, the total count. This
process is all done within the framework of
**by**, for which data
must be **sort**ed on **rep78**, which is done first. Under
**by:**, the **generate** is carried out separately for each group of
**rep78**. Similarly, the **replace** is done separately for each
group of **rep78**. (You are also able to save a statement by making use
of **by**...**,
sort**, but that is incidental to the main idea.)

As it happens, there is a quicker way to do the above commands with
**egen**:

. egen nhimpg = total(mpg > 30), by(rep78)

The built-in function **sum()** produces cumulative or running sums,
whereas the **egen** function **total()** produces just sums.

Here we use the fact that there are no missing values of
**mpg** in the **auto** dataset. And, whenever you know this is
true of a variable in your data, you too can ignore the possibility of
missing values. But, a more general method for counting observations greater
than some threshold is to use
**total(***varname***>***threshold*** &
***varname***< .)**. That is a safe and never sorry method
whenever you want to exclude missing values. (Of course, if missing means in
practice "too high to be measured", then you might want to include missing.)

Now consider what happens if you type something like

. list mpg if foreign == 1

Stata lists **mpg** for those observations for which **foreign** is
equal to 1 (and does not **list** them if this is not so).
Stata lists **mpg** whenever the logical expression **foreign ==
1** is true or evaluates to 1.
We see above a more long-winded explanation of this process.

This method looks like the same idea in a different form. It is, but there are extra twists. Consider now

. list mpg if foreign

There are no relational or logical operators in sight, but Stata is
broad-minded here. It will still try its best to find a way of deciding on
true or false; in fact, it will accept any argument that evaluates to a
number not 0 as true, and any argument that evaluates to 0 as
false. If the mathematical or computer jargon "argument" is new
to you, think of it here as indicating whatever is fed to **if**.

For a numeric variable such as **foreign**, Stata looks at the values of
that variable, and not 0 is treated as true and 0 as false. In other words,

.whateverif foreign

and

.whateverif foreign != 0

are exactly equivalent. This is always true for any numeric variable. In practice, there is a shortcut if and only if you have an indicator variable that takes only the values 0 or 1. The two statements

. list mpg if foreign == 1 . list mpg if foreign

are equivalent in practice in the **auto** dataset. In the first
statement, Stata evaluates the expression **foreign == 1**, and then
executes the action indicated (to **list**) if and only if the expression
is true, or evaluates numerically to 1. In the second statement, Stata looks
at the values of the variable **foreign**, and then executes the action
if and only if the value is a number not 0. In the auto dataset,
**foreign** is not 0 when and only when it is equal to 1, so the two
conditions are satisfied by exactly the same observations. Over time this
will save you many keystrokes when you are working with indicator variables,
and it will let you type Stata syntax close to the way you are thinking,
say, **if female** or even **if !female**. (The **!** is a way of
reversing the choice: **!** flips any value not 0 to 0, and any value 0
to 1.). But remember that numeric missings count as not 0 because they indicate
a number much greater than 0.

You can always check, either interactively or in a program, that a variable
has only the values 0 and 1 by using
**assert**:

. assertvarname== 0 |varname== 1

If *varname* were equal to any other value, Stata would deny the
assertion. If you typed, perhaps by accident,

. list mpg if rep78

you will get a list for all observations, because **rep78** is never 0.
It is the same logic.

If the argument were just a number, then the same logic still applies. This
logic also can be useful with **if**. For example, you could count
missing values and take some action only if one or more missing values were
present. It can also be useful with the
**while** command,
which is more of a programmer's command, which we will illustrate in more
detail. **while 1** gives you an endless loop: the **1** is arbitrary
here, as any number not 0 would do. Presumably, within your otherwise
endless loop, you will add some test that gets Stata out of the loop, say,
with **continue**.
A related technique is to set a flag and to exit the loop only if and when
that flag has been changed:

. local worktodo = 1 . while `worktodo' {program statements including setting`worktodo'to 0 when task completed}

Finally, if you were to supply, perhaps by accident, the name of a string
variable or a text string as an argument to **if** or **while**, there
would be an error message, as Stata cannot interpret either as a numeric
argument. Only numeric arguments can be considered true or false.