| Title | Using named expressions with collect get to add results into a table | |
| Authors |
Pei-Chun Lai, StataCorp Jeff Pitblado, StataCorp |
The risk ratio (RR) and odds ratio (OR) are two common measures of effect size that we can obtain after fitting a logistic regression model. While we can get OR values in the results from the logistic command (and logit with the or option), RR values are neither estimated nor stored in a way that allows us to easily get them into a table. How could we estimate these two measures and include them in a table? In this FAQ, we focus primarily on using “named expressions” with the collect get command to get RR and OR values and on including them in a table using the collect suite of commands.
We will use the nhanes2l dataset and model described in Logistic regression 1: Introduction.
Our variables of interest are highbp and diabetes.
. clear all . webuse nhanes2l (Second National Health and Nutrition Examination Survey) . describe highbp diabetes
| Variable Storage Display Value |
| name type format label Variable label |
| highbp byte %8.0g * High blood pressure |
| diabetes byte %12.0g diabetes Diabetes status |
First, let us add labels to the levels of highbp:
. label define highbp 0 "Normotensive" 1 "Hypertensive" . label values highbp highbp
Then we examine the relationship between the variables highbp and diabetes using the tabulate command:
. tabulate highbp diabetes
| High blood | Diabetes status | |||
| pressure | Not diabe Diabetic | Total | ||
| Normotensive | 5,795 178 | 5,973 | ||
| Hypertensive | 4,055 321 | 4,376 | ||
| Total | 9,850 499 | 10,349 | ||
The odds of hypertension for people with diabetes is
(321/499)/{1 − (321/499)} = 321/178 = 1.80337
The odds of hypertension for people without diabetes is
(4055/9850)/{1 − (4055/9850)} = 4055/5795 = 0.69974
The resulting OR is
1.80337/0.69974 = 2.57720
The risk of hypertension for people with diabetes is
321/499 = 0.64329
For people without diabetes, the risk is
4055/9850 = 0.41168
Then the RR for people with diabetes is
0.64329/0.41168 = 1.56260
Now let us fit a logistic regression model using highbp as the binary outcome variable and i.diabetes as a binary predictor variable:
. logit highbp i.diabetes
Iteration 0: Log likelihood = -7049.6663
Iteration 1: Log likelihood = -6998.1058
Iteration 2: Log likelihood = -6998.1054
Logistic regression Number of obs = 10,349
LR chi2(1) = 103.12
Prob > chi2 = 0.0000
Log likelihood = -6998.1054 Pseudo R2 = 0.0073
| highbp | Coefficient Std. err. z P>|z| [95% conf. interval] | |
| diabetes | ||
| Diabetic | .9467015 .0956682 9.90 0.000 .7591952 1.134208 | |
| _cons | -.3570448 .0204737 -17.44 0.000 -.3971725 -.3169171 | |
After we run the logit command, the OR and RR values are not stored as scalars in e() or r(). So how can we compute these ratios and include them in a table? One solution is to use the margins command to get the predicted probabilities of highbp = 1 for people with diabetes and people without diabetes:
. margins diabetes Adjusted predictions Number of obs = 10,349 Model VCE: OIM Expression: Pr(highbp), predict()
| Delta-method | ||
| Margin std. err. z P>|z| [95% conf. interval] | ||
| diabetes | ||
| Not diabetic | .4116751 .0049587 83.02 0.000 .4019563 .421394 | |
| Diabetic | .6432864 .0214443 30.00 0.000 .6012563 .6853164 | |
| 0. 1 | . | |
| diabetes diabetes | ||
| r1 | .4116751 .6432864 |
The values of the predicted probabilities are stored in the r(b) matrix, and the formulas for our measures of interest become
RR = r(b)[1,2]/ r(b)[1,1]
OR = r(b)[1,2]/(1 − r(b)[1,2]) / r(b)[1,1]/(1 − r(b)[1,1])
Here we use the collect get command to compute these values and put them into a collection:
. collect get RR = (r(b)[1,2]/r(b)[1,1]) OR = ((r(b)[1,2]/(1 − r(b)[1,2]))/(r(b)[1,1]/(1 − r(b)[1,1])))
In the above, we apply the named expression syntax to define the RR and OR. They are specified as name = (exp). The name becomes the level of the result dimension used to tag the collected value. Therefore, the RR’s value is tagged with result[RR], and the OR’s value is tagged with result[OR]. While expressions with spaces must be bound in parentheses, we recommend binding all expressions in parentheses.
Here we use collect levelsof results to show the names of the values (the levels of the result dimension) we have collected.
. collect levelsof result
Collection: default
Dimension: result
Levels: OR RR
Our call to collect get also updated the automatic levels of the result dimension. Automatic levels are the levels of a dimension that are automatically displayed in the table when the dimension is specified without explicit levels in the collect layout command. To view these, the collect query autolevels result command displays the automatic levels for the result dimension:
. collect query autolevels result
Automatic dimension levels
Collection: default
Dimension: result
Levels: RR OR
For more information on the automatic levels, see FAQ: What are the autolevels of a dimension in a table (collection)?
Let's arrange our collected items into a table.
. collect layout (cmdset) (result)
Collection: default
Rows: cmdset
Columns: result
Table 1: 1 x 2
| RR OR | ||
| 1 | 1.562607 2.577195 | |
collect's default numeric format is %9.0g. We use the nformat() option with the collect style cell command to modify this format to display four digits after the decimal:
. collect style cell result, nformat(%7.4f)
Then we specify the labels for result[RR] and result[OR] as follows:
. collect label levels result RR "Risk Ratio" OR "Odds Ratio"
We use collect preview to preview the table of RR and OR:
. collect preview
| Risk Ratio Odds Ratio | ||
| 1 | 1.5626 2.5772 | |
Finally, we apply the collect export command to export this table to a Word file:
. collect export RR_OR.docx, replace
(collection default exported to file RR_OR.docx)
The full code is provided below:
clear all webuse nhanes2l describe highbp diabetes label define highbp 0 "Normotensive" 1 "Hypertensive" label values highbp highbp tabulate highbp diabetes logit highbp i.diabetes margins diabetes matlist r(b) collect get RR = (r(b)[1,2]/r(b)[1,1]) OR = ((r(b)[1,2]/(1- r(b)[1,2]))/(r(b)[1,1]/(1- r(b)[1,1]))) collect levelsof result collect query autolevels result collect layout (cmdset) (result) collect style cell result, nformat(%7.4f) collect label levels result RR "Risk Ratio" OR "Odds Ratio" collect preview collect export RR_OR.docx, replace