FAQ: How do I include risk ratio (RR) and odds ratio (OR) in a table after fitting a logistic regression model?

Home / Resources & support / FAQs / Using named expressions with collect get to add results into a table

How do I include the risk ratio (RR) and odds ratio (OR) in a table after fitting a logistic regression model?

Title:		Using named expressions with collect get to add results into a table
Authors		Pei-Chun Lai, StataCorp Jeff Pitblado, StataCorp

The risk ratio (RR) and odds ratio (OR) are two common measures of effect size after fitting a logistic regression model. While we can get OR values in the results from the logistic command (and logit with the option or), RR values are neither estimated nor stored in a way to easily get them into a table. How could we estimate these two measures and include them in a table? In this FAQ, we focus primarily on using “named expressions” with the collect get command to get RR and OR values and including them in a table using the collect suite of commands.

We will use the nhanes2l dataset and model described in Logistic regression 1: Introduction.

Our variables of interest are highbp and diabetes.

. clear all

. webuse nhanes2l
(Second National Health and Nutrition Examination Survey)

. describe highbp diabetes


Variable      Storage   Display    Value                                
    name         type    format    label      Variable label            

highbp          byte    %8.0g               * High blood pressure       
diabetes        byte    %12.0g     diabetes   Diabetes status

First, let us add labels to the levels of highbp:

. label define highbp 0 "Normotensive" 1 "Hypertensive"

. label values highbp highbp

Then we examine the relationship between the variables highbp and diabetes using the tabulate command:

. tabulate highbp diabetes


  High blood     Diabetes status           
    pressure   Not diabe   Diabetic       Total

Normotensive       5,795        178       5,973
Hypertensive       4,055        321       4,376

       Total       9,850        499      10,349

The odds of having hypertension for someone with diabetes is

(321/499)/{1 − (321/499)} = 321/178 = 1.80337

The odds of having hypertension for someone without diabetes is

(4055/9850)/{1 − (4055/9850)} = 4055/5795 = 0.69974

The resulting OR is

1.80337/0.69974 = 2.5772

The risk of having hypertension for someone with diabetes is

321/499 = 0.643287

For someone without diabetes, the risk is

4055/9850 = 0.411675

Then, the RR for someone with diabetes is

0.643287/0.411675 = 1.5626

Now let’s fit a logistic regression model using highbp as the binary outcome variable and i.diabetes as a binary predictor variable:

. logit highbp i.diabetes

Iteration 0:  Log likelihood = -7049.6663  
Iteration 1:  Log likelihood = -6998.1058  
Iteration 2:  Log likelihood = -6998.1054  

Logistic regression                                     Number of obs = 10,349
                                                        LR chi2(1)    = 103.12
                                                        Prob > chi2   = 0.0000
Log likelihood = -6998.1054                             Pseudo R2     = 0.0073



      highbp   Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
   
    diabetes  
   Diabetic      .9467015   .0956682     9.90   0.000     .7591952    1.134208
       _cons    -.3570448   .0204737   -17.44   0.000    -.3971725   -.3169171

After running the logit command, Stata does not store OR and RR values as scalars in e() or r(). So how can we compute these ratios and include them in a table? First, we use the margins command to obtain the predicted probabilities of highbp = 1 for someone with diabetes and without diabetes:

. margins diabetes

Adjusted predictions                                    Number of obs = 10,349
Model VCE: OIM

Expression: Pr(highbp), predict()



                           Delta-method
                    Margin   std. err.      z    P>|z|     [95% conf. interval]
   
     diabetes  
Not diabetic      .4116751   .0049587    83.02   0.000     .4019563     .421394
    Diabetic      .6432864   .0214443    30.00   0.000     .6012563    .6853164



. matlist r(b)


                      0.         1 .
               diabetes   diabetes 
          r1   .4116751   .6432864

The values of the predicted probabilities are stored in the r(b) matrix, and the formula for the RR becomes

RR = r(b)[1,2]/r(b)[1,1]

Here we use the collect get command to compute the RR and put it into a collection:

. collect get RR=(r(b)[1,2]/r(b)[1,1]), tags(cmdset[1])

Next we use the margins diabetes, expression(exp(predict(xb))) command to estimate the odds of having hypertension for someone with and without diabetes:

. margins diabetes, expression(exp(predict(xb)))

Adjusted predictions                                    Number of obs = 10,349
Model VCE: OIM

Expression: exp(predict(xb))



                           Delta-method
                    Margin   std. err.      z    P>|z|     [95% conf. interval]
   
     diabetes  
Not diabetic      .6997412   .0143263    48.84   0.000     .6716622   .7278201
    Diabetic      1.803369   .1685281    10.70   0.000     1.47306    2.133678



. matlist r(b)


                      0.         1 .
               diabetes   diabetes 
          r1   .6997412   1.803369

The values of the predicted odds are stored in the r(b) matrix, and the formula for the OR becomes

OR = r(b)[1,2]/r(b)[1,1]

Again, we use the collect get command to compute the OR and put it into a collection:

. collect get OR=(r(b)[1,2]/r(b)[1,1]), tags(cmdset[1])

Above, we apply the named expression syntax, name = (exp), to define the RR and OR. The assigned name becomes the level of the result dimension used to tag the collected value. Therefore, the RR’s value is tagged with result[RR], and the OR’s value is tagged with result[OR]. While expressions with spaces must be bound in parentheses, we recommend binding all expressions in parentheses regardless of spaces.

In addition, we use the tags() option to assign the cmdset[1] tag to both of these ratios in the collection. Later, we will specify the cmdset[1] tag as the row tag in the collect layout command so that both ratios appear on the same row.

Here we use collect levelsof results to show the names of the values (the levels of the result dimension) we have collected:

. collect levelsof result

Collection: default
 Dimension: result
    Levels: OR RR

Our call to collect get also updated the automatic levels of the result dimension. Automatic levels are the levels of a dimension that are automatically displayed in the table when the dimension is specified without explicit levels in the collect layout command. The collect query autolevels result command displays the automatic levels for the result dimension:

. collect query autolevels result

Automatic dimension levels
Collection: default
 Dimension: result
    Levels: RR OR

For more information on the automatic levels, see FAQ: What are the autolevels of a dimension in a table (collection)?

Let's arrange our collected items into a table.

. collect layout (cmdset[1]) (result)

Collection: default
      Rows: cmdset[1]
   Columns: result
   Table 1: 1 x 2



          RR       OR
   
1   1.562607 2.577195

The default numeric format of the collect command is %9.0g. We use the nformat() option with the collect style cell command to modify this format to display four digits after the decimal:

. collect style cell result, nformat(%7.4f)

Then we specify the labels for result[RR] and result[OR] as follows:

. collect label levels result RR "Risk Ratio" OR "Odds Ratio"

We use collect preview to preview the table containing the RR and OR:

. collect preview



    Risk Ratio Odds Ratio
   
1       1.5626     2.5772

Finally, we apply the collect export command to export this table to a Word file:

. collect export RR_OR.docx, replace
(collection default exported to file RR_OR.docx)

The full code is provided below:

clear all
webuse nhanes2l
describe highbp diabetes
label define highbp 0 "Normotensive" 1 "Hypertensive"
label values highbp highbp
tabulate highbp diabetes

*Compute and collect the RR
logit highbp i.diabetes
margins diabetes
matlist r(b)
collect get RR=(r(b)[1,2]/r(b)[1,1]), tags(cmdset[1])

*Compute and collect the OR
margins diabetes, expression(exp(predict(xb)))
matlist r(b)
collect get OR=(r(b)[1,2]/r(b)[1,1]), tags(cmdset[1])

collect levelsof result
collect query autolevels result
collect layout (cmdset[1]) (result)
collect style cell result, nformat(%7.4f)
collect label levels result RR "Risk Ratio" OR "Odds Ratio"
collect preview
*Export the collection to RR_OR.docx file
collect export RR_OR.docx, replace

How do I include the risk ratio (RR) and odds ratio (OR) in a table after fitting a logistic regression model?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

High blood	Diabetes status
pressure	Not diabe Diabetic	Total

Normotensive	5,795 178	5,973
Hypertensive	4,055 321	4,376

Total	9,850 499	10,349


highbp		Coefficient Std. err. z P>\|z\| [95% conf. interval]

diabetes
Diabetic		.9467015 .0956682 9.90 0.000 .7591952 1.134208
_cons		-.3570448 .0204737 -17.44 0.000 -.3971725 -.3169171


		Delta-method
		Margin std. err. z P>\|z\| [95% conf. interval]

diabetes
Not diabetic		.4116751 .0049587 83.02 0.000 .4019563 .421394
Diabetic		.6432864 .0214443 30.00 0.000 .6012563 .6853164

Stata/MP4 Annual License (download)

How do I include the risk ratio (RR) and odds ratio (OR) in a table after fitting a logistic regression model?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies