Stata | FAQ: Creating variables that are -plurality- measures

Home / Resources & support / FAQs / Creating variables that are -plurality- measures

How do I calculate measures such as percent improved minus percent deteriorated?

Title		Creating variables that are “plurality” measures
Author		Nicholas J. Cox, Durham University, UK

When reporting ordered or graded scales, working with simple descriptive summaries like

% improved − % deteriorated

% ranking as good − % ranking as bad

is sometimes helpful. In such summaries, omitting any neutral or middle category is common (but not essential). Clearly, such a measure gives the preponderance of two tails: if everybody improved, we get 100, and, if everybody got worse, we get −100.

In political terms, an election could be imagined in which there are votes “for” and “against” from these two categories, and from that context, these measures may be described as plurality measures. (Is there a better general term, or any term that is standard in some field, for particular examples of such measures?) Whatever the terminology, such measures are discussed in Tukey (1977, 498–502), Zeisel (1985, 75–77), Wilkinson (2005, 57–58), and Wexler, Shaffer, and Cotgreave (2017, 186–200).

Naturally, the percent formulation is not compulsory, and you could just as easily—in fact, a little more easily—work with proportions or fractions with results ranging from 1 to −1. In either case, using a difference is natural whenever thinking is in terms of the percent or proportion scale being used. Also, a ratio such as

% ranking as good / % ranking as bad

may be less desirable with small denominators. Either the result may be unstable, or, if the denominators are ever 0, it may be indeterminate.

Consider a three-grade coding, say, 1 = improved, 2 = unchanged, and 3 = deteriorated. To get this summary, we need to generate a score

        (*) gen score = (code == 1) - (code == 3)

        gen score = 100 * ((code == 1) - (code == 3))

and that’s essentially it. We just follow the generation by summarize; tabulate, summarize( ); tabstat; or whatever we need.

Taking (*) piece by piece: if code is 1,

        code == 1

is true and is evaluated as 1, and

        code == 3

is false and evaluated as 0, so

        (code == 1) - (code == 3)

evaluates as 1. The principle of true-or-false logical expressions being evaluated as 1 or 0 is discussed at [U] 13.2.3 Relational operators. If code is 2, then score is 0, and if code is 3, then score is −1. (If we multiply by 100, then score is 100, 0, and −100.)

With this coding, we could also get the same result by

        generate score = 2 - code

If the coding had been reversed, from 1 = deteriorated to 3 = improved, then code - 2 would have worked. For other simple coding sequences, some other linear transformation would have worked. So, why place so much stress on the earlier formulation? It generalizes much more easily to messier examples. Take a five-point scale such as rep78 in the auto data or 1 = strongly agree to 5 = strongly disagree. We might decide to omit the 3s, lump together two codes in each tail, and

        gen score = (code >= 4) - (code <= 2)

Just as before, the true-or-false expressions evaluate as 1 if true and 0 if false.

A pitfall to be pointed out immediately is that missing values count as higher than any other numeric value. Hence, you will be safer with

        gen score = (code >= 4) - (code <= 2) if code < .

Similar ideas may be useful in situations with just two categories. Also, they may arise with different data structures. Let us illustrate both points with the idea of looking at gender roles across a set of activities, and

        % who are female − % who are male

as a way of summarizing data on who does what. If, in a village, 21 women and zero men do laundry, four men and 11 women fetch water, and 14 men and zero women take care of cows, then neither the male–female ratio nor the female–male ratio can be used throughout to summarize the balance of the sexes. Whenever zero is a denominator, the ratio is indeterminate. Even if no zeros are present, we should worry about sensitivity. However, the measure above is one which is always practical. If the data come as three variables, one for activity, f for females, and m for males, then no logical expressions are needed. Simply type

        gen balance = 100 * ((f/(f + m)) - (m/(f + m)))

References

Tukey, J. W. 1977.: Exploratory Data Analysis. Reading, MA: Addison–Wesley.

Wexler, S., J. Shaffer, and A. Cotgreave. 2017.: The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios. Hoboken, NJ: John Wiley.

Wilkinson, L. 2005.: The Grammar of Graphics. 2nd ed. New York: Springer.

Zeisel, H. 1985.: Say It with Figures. 6th ed. New York: Harper & Row.

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

How do I calculate measures such as percent improved minus percent deteriorated?

References

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

How do I calculate measures such as percent improved minus percent deteriorated?

References

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies