What is true and false in Stata?
|
Title
|
|
True and false in Stata
|
|
Author
|
Nicholas J. Cox, Durham University, UK
|
|
Date
|
October 2001; minor revisions February 2003, August 2005
|
Most computer languages have some way of indicating and working with what is
true and what is false, but not all languages choose exactly the same way.
Stata follows two rules, the second of which may be considered as a
generalization of the first. I will state the rules, and then we will look
at each in turn.
- Rule 1: Logical or Boolean expressions evaluate to 0 if false, 1 if true.
- Rule 2: Logical or Boolean arguments, such as the argument to if
or while, may take on any value, not just 0 or 1; 0 is treated as
false and any other numeric value as true.
Rule 1: Logical or Boolean expressions evaluate to 0 if false, 1 if true
First, consider the results of logical or Boolean expressions. (George Boole
worked on logic and probability in the nineteenth century. For more about George
Boole, see
http://www-history.mcs.st-and.ac.uk/~history/Mathematicians/Boole.html.)
In Stata, these expressions use one or more various relational
and logical operators. The operators ==, ~=, !=,
>, >=, <, and <= are used to test
equality or inequality. The operators & | ~ and ! are used to
indicate "and", "or", and "not". It is a matter of taste whether you use
~ or ! to indicate negation. In this FAQ, we use !.
If you want to learn more about any of these, see
operators.
For example, in the auto dataset, the expression foreign == 1 will be
true for those observations where the variable foreign is 1 and
false otherwise. The double equal sign == is used whenever you wish
to test for equality; compare the use of the single equal sign = for
assignment. As a second example, the expression 2 == 2 is always
true. That may not seem helpful or instructive, but below we will see a use
for expressions that are necessarily always true. More complicated
expressions can readily be constructed: foreign == 1 & rep78 == 4
will be true whenever foreign == 1 and rep78 == 4. Typing
. count if foreign == 1 & rep78 == 4
shows that there are nine such cars in the auto dataset. (Incidentally, the
count
command may seem trivial, yet it is a simple way of getting answers to some
basic questions about your data.)
Logical expressions have numerical values, which can be immensely useful. In
Stata, the rule is that false logical expressions have value 0 and true
logical expressions have value 1. Thus logical expressions may be used to
generate indicator variables (also often called binary, dichotomous, dummy,
logical, or Boolean, depending on tribal jargon), which have values 0 or 1.
The command
. generate himpg = mpg > 30
will generate a new variable that is 1 whenever mpg is greater than
30, and 0 otherwise. Two wrinkles should now be mentioned. What if
mpg were missing? The rule is that Stata treats numeric missing
values as higher than any other numeric value, so missing would certainly
qualify as greater than 30, and any observation with mpg missing
would be assigned 1 for this new variable. This rule leads to the next
wrinkle: typing
. generate himpg = mpg > 30 if mpg < .
would assign 1 if mpg were greater than 30 but not missing; 0 if
mpg were not greater than 30; and missing if mpg were missing.
The logic is that you did not say what result you wanted if mpg were
missing; in the absence of instructions, Stata will shrug its shoulders in
the only way it knows, assigning a result of missing. The same logic would
apply if you were only interested in domestic cars:
. generate himpg = mpg > 30 if foreign == 0
If foreign were not equal to 0, then the result would be missing.
Otherwise, the result would be 1 or 0 according to whether mpg was or
was not greater than 30.
Numerical value of logical
expressions always proves useful when we want to count something. Suppose we want to
create a new variable in which we will put the frequencies of mpg
being greater than 30, by categories of rep78:
. sort rep78
. by rep78: generate nhimpg = sum(mpg > 30)
. by rep78: replace nhimpg = nhimpg[_N]
In the second statement, the function sum() produces a cumulative or
running sum of mpg > 30. If mpg > 30, 1 is added to the
sum; otherwise, 0 is added. This statement yields a running count of the
number of observations for which mpg > 30. In the third statement,
we replace the running count with its last value, the total count. This
process is all done within the framework of
by, for which data
must be sorted on rep78, which is done first. Under
by:, the generate is carried out separately for each group of
rep78. Similarly, the replace is done separately for each
group of rep78. (You are also able to save a statement by making use
of by...,
sort, but that is incidental to the main idea.)
As it happens, there is a quicker way to do the above commands with
egen:
. egen nhimpg = total(mpg > 30), by(rep78)
The built-in function sum() produces cumulative or running sums,
whereas the egen function total() produces just sums.
Here we use the fact that there are no missing values of
mpg in the auto dataset. And, whenever you know this is
true of a variable in your data, you too can ignore the possibility of
missing values. But, a more general method for counting observations greater
than some threshold is to use
total(varname>threshold &
varname< .). That is a safe and never sorry method
whenever you want to exclude missing values. (Of course, if missing means in
practice "too high to be measured", then you might want to include missing.)
Rule 2: Logical or Boolean arguments, such as the argument to if or
while, may take on any value, not just 0 or 1; 0 is treated as false
and any other numeric value as true
Now consider what happens if you type something like
. list mpg if foreign == 1
Stata lists mpg for those observations for which foreign is
equal to 1 (and does not list them if this is not so).
Stata lists mpg whenever the logical expression foreign ==
1 is true or evaluates to 1.
We see above a more long-winded explanation of this process.
This method looks like the same idea in a different form. It is, but there
are extra twists. Consider now
. list mpg if foreign
There are no relational or logical operators in sight, but Stata is
broad-minded here. It will still try its best to find a way of deciding on
true or false; in fact, it will accept any argument that evaluates to a
number not 0 as true, and any argument that evaluates to 0 as
false. If the mathematical or computer jargon "argument" is new
to you, think of it here as indicating whatever is fed to if.
For a numeric variable such as foreign, Stata looks at the values of
that variable, and not 0 is treated as true and 0 as false. In other words,
. whatever if foreign
and
. whatever if foreign != 0
are exactly equivalent. This is always true for any numeric variable. In
practice, there is a shortcut if and only if you have an indicator variable
that takes only the values 0 or 1. The two statements
. list mpg if foreign == 1
. list mpg if foreign
are equivalent in practice in the auto dataset. In the first
statement, Stata evaluates the expression foreign == 1, and then
executes the action indicated (to list) if and only if the expression
is true, or evaluates numerically to 1. In the second statement, Stata looks
at the values of the variable foreign, and then executes the action
if and only if the value is a number not 0. In the auto dataset,
foreign is not 0 when and only when it is equal to 1, so the two
conditions are satisfied by exactly the same observations. Over time this
will save you many keystrokes when you are working with indicator variables,
and it will let you type Stata syntax close to the way you are thinking,
say, if female or even if !female. (The ! is a way of
reversing the choice: ! flips any value not 0 to 0, and any value 0
to 1.). But remember that numeric missings count as not 0 because they indicate
a number much greater than 0.
You can always check, either interactively or in a program, that a variable
has only the values 0 and 1 by using
assert:
. assert varname == 0 | varname == 1
If varname were equal to any other value, Stata would deny the
assertion. If you typed, perhaps by accident,
. list mpg if rep78
you will get a list for all observations, because rep78 is never 0.
It is the same logic.
If the argument were just a number, then the same logic still applies. This
logic also can be useful with if. For example, you could count
missing values and take some action only if one or more missing values were
present. It can also be useful with the
while command,
which is more of a programmer's command, which we will illustrate in more
detail. while 1 gives you an endless loop: the 1 is arbitrary
here, as any number not 0 would do. Presumably, within your otherwise
endless loop, you will add some test that gets Stata out of the loop, say,
with continue.
A related technique is to set a flag and to exit the loop only if and when
that flag has been changed:
. local worktodo = 1
. while `worktodo' {
program statements including setting `worktodo' to 0 when task completed
}
Finally, if you were to supply, perhaps by accident, the name of a string
variable or a text string as an argument to if or while, there
would be an error message, as Stata cannot interpret either as a numeric
argument. Only numeric arguments can be considered true or false.
|