How do I calculate measures such as percent improved minus percent deteriorated?
|
Title
|
|
Creating variables that are “plurality” measures
|
|
Author
|
Nicholas J. Cox, Durham University, UK
|
|
Date
|
June 2001; revised March 2007
|
When reporting ordered or graded scales, working
with simple descriptive summaries like
% improved − % deteriorated
or
% ranking as good − % ranking as bad
is sometimes helpful. In such summaries, omitting any neutral or middle
category is common (but not essential). Clearly, such a measure gives the
preponderance of two tails: if everybody improved, we get 100, and, if
everybody got worse, we get −100.
In political terms, an election could be imagined in which there are votes
“for” and “against” from these two categories, and
from that context, these measures may be described as
plurality measures. (Is there a better general term, or any
term that is standard in some field, for particular examples of such
measures?) Whatever the terminology, such measures are discussed in Tukey
(1977, 498–502), Zeisel (1985, 75–77), and Wilkinson (2005,
57–58).
Naturally, the percent formulation is not compulsory, and you could just as
easily—in fact, a little more easily—work with proportions
or fractions with results ranging from 1 to −1. In either case, using
a difference is natural whenever thinking is in terms of the percent or
proportion scale being used. Also, a ratio such as
% ranking as good / % ranking as bad
may be less desirable with small denominators. Either the result may be
unstable, or, if the denominators are ever 0, it may be indeterminate.
Consider a three-grade coding, say, 1 = improved, 2 = unchanged, and 3 =
deteriorated. To get this summary, we need to generate a score
(*) gen score = (code == 1) - (code == 3)
or
gen score = 100 * ((code == 1) - (code == 3))
and that’s essentially it. We just follow the generation by
summarize; tabulate, summarize( ); tabstat; or whatever
we need.
Taking (*) piece by piece: if code is 1,
code == 1
is true and is evaluated as 1, and
code == 3
is false and evaluated as 0, so
(code == 1) - (code == 3)
evaluates as 1. The principle of true-or-false logical expressions being
evaluated as 1 or 0 is discussed at [U] 13.2.3 Relational operators.
If code is 2, then score is 0, and if code is 3, then
score is −1. (If we multiply by 100, then score is 100,
0, and −100.)
With this coding, we could also get the same result by
generate score = 2 - code
If the coding had been reversed, from 1 = deteriorated to 3 = improved, then
code - 2 would have worked. For other simple coding sequences, some
other linear transformation would have worked. So, why place so much stress
on the earlier formulation? It generalizes much more
easily to messier examples. Take a five-point scale such as rep78 in the
auto data or 1 = strongly agree to 5 = strongly disagree. We might decide to
omit the 3s, lump together two codes in each tail, and
gen score = (code >= 4) - (code <= 2)
Just as before, the true-or-false expressions evaluate as 1 if true and 0 if
false.
A pitfall to be pointed out immediately is that missing values count as
higher than any other numeric value. Hence, you will be safer with
gen score = (code >= 4) - (code <= 2) if code < .
Similar ideas may be useful in situations with just two categories.
Also, they may arise with different data structures. Let us illustrate
both points with the idea of looking at gender roles across a set of
activities, and
% who are female − % who are male
as a way of summarizing data on who does what. If, in a village, 21 women and
zero men do laundry, four men and 11 women fetch water, and 14 men and zero women
take care of cows, then neither the male–female ratio nor the
female–male ratio can be used throughout to summarize the balance of
the sexes. Whenever zero is a denominator, the ratio is indeterminate. Even
if no zeros are present, we should worry about sensitivity. However, the
measure above is one which is always practical. If the data come as three
variables, one for activity, f for females, and m for males,
then no logical expressions are needed. Simply type
gen balance = 100 * ((f/(f + m)) - (m/(f + m)))
References
-
Tukey, J. W. 1977.
- Exploratory Data Analysis. Reading: Addison–Wesley.
-
Wilkinson, L. 2005.
- The Grammar of Graphics. 2nd ed. New York: Springer.
-
Zeisel, H. 1985.
- Say It with Figures. 6th ed.
New York: Harper & Row.
|