Likelihood-ratio tests using lrtest

Stata's lrtest command is a postestimation tool for conducting likelihood-ratio tests after fitting regression models. We will use linear regression below, but the same principles and syntax work with nearly all of Stata's regression commands, including probit, logistic, poisson, and others. You will want to review Stata's factor-variable notation if you have not used it before.

Let's begin by opening the nhanes2l dataset. Then let's describe and summarize the variables bpsystol, diabetes, hlthstat, and age.

. webuse nhanes2l
(Second National Health and Nutrition Examination Survey)

. describe bpsystol diabetes hlthstat age

Variable      Storage   Display    Value                                
    name         type    format    label      Variable label            

bpsystol        int     %9.0g                 Systolic blood pressure   
diabetes        byte    %12.0g     diabetes   Diabetes status           
hlthstat        byte    %20.0g     hlth       Health status             
age             byte    %9.0g                 Age (years)               

. summarize bpsystol diabetes hlthstat age

    Variable          Obs        Mean    Std. dev.       Min        Max

    bpsystol      10,351    130.8817    23.33265         65        300
    diabetes      10,349    .0482172    .2142353          0          1
    hlthstat      10,335    2.586164    1.206196          1          5
         age      10,351    47.57965    17.21483         20         74

bpsystol measures systolic blood pressure (SBP) with a range of 65 to 300 mmHg; diabetes is an indicator variable for diabetes status with values of 0 to 1; hlthstat is a categorical variable with five categories of health status; and age measures age with a range of 20 to 74 years.

We are going to fit a series of linear regression models for the outcome variable bpsystol. Likelihood-ratio tests allow us to test hypotheses about one or more coefficients in a regression model. The test usually involves the following five steps:

Fit a “full” regression model.
Store the parameter estimates from the full model by using estimates store.
Fit a “reduced” regression model.
Store the parameter estimates from the reduced model by using estimates store.
Conduct the likelihood-ratio test using lrtest.

Let's run an example to see how it works. In step one, our full model is a linear regression model using the continuous outcome variable bpsystol and the predictor variables diabetes, hlthstat, and age. We use factor-variable notation to tell Stata that diabetes and hlthstat are categorical predictors and age is a continuous predictor. We also use the interaction operator ## to request the main effects of diabetes and age along with their interaction. And we use the # operator to request the interaction of age with itself, which is equivalent to the square of age.

. regress bpsystol i.hlthstat i.diabetes##c.age c.age#c.age


      Source         SS           df       MS     Number of obs   =    10,335
      F(8, 10326)     =    415.86
       Model    1371889.87         8  171486.233    Prob > F        =    0.0000
    Residual    4258083.48    10,326  412.365242    R-squared       =    0.2437
      Adj R-squared   =    0.2431
       Total    5629973.35    10,334  544.800982    Root MSE        =    20.307




    bpsystol   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
   
    hlthstat                                                                  
  Very good       .829615    .576469     1.44   0.150    -.3003759    1.959606
       Good      2.438839   .5703592     4.28   0.000     1.320825    3.556854
       Fair      4.179397   .6809503     6.14   0.000     2.844602    5.514191
       Poor      3.100577    .905358     3.42   0.001       1.3259    4.875255
                                                                              
    diabetes                                                                  
   Diabetic     -2.789364   4.999021    -0.56   0.577    -12.58841    7.009687
         age     .0436002   .0865406     0.50   0.614    -.1260361    .2132365
                                                                              
    diabetes#                                                                  
       c.age                                                                  
   Diabetic       .158519   .0812441     1.95   0.051    -.0007352    .3177732
                                                                              
 c.age#c.age     .0060262   .0009247     6.52   0.000     .0042137    .0078387
                                                                              
       _cons      111.268   1.832332    60.72   0.000     107.6763    114.8597

In step two, we use estimates store to temporarily store the parameter estimates in memory. Let's name our estimates full.

. estimates store full

The output includes a Wald test for the null hypothesis that the age-squared coefficient, labeled c.age#c.age, equals 0. The t statistic equals 6.52, and the p-value equals 0.000. Let's test this same hypothesis with a likelihood-ratio test instead of a Wald test. To do so, in step 3 the reduced model that we fit is the same as the model above but without the age-squared term.

. regress bpsystol i.hlthstat i.diabetes##c.age


      Source         SS           df       MS     Number of obs   =    10,335
      F(7, 10327)     =    467.32
       Model     1354374.9         7  193482.129    Prob > F        =    0.0000
    Residual    4275598.45    10,327  414.021347    R-squared       =    0.2406
      Adj R-squared   =    0.2401
       Total    5629973.35    10,334  544.800982    Root MSE        =    20.348




    bpsystol   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
   
    hlthstat                                                                  
  Very good      .8834185   .5775661     1.53   0.126     -.248723     2.01556
       Good      2.377764   .5714262     4.16   0.000     1.257658     3.49787
       Fair      4.285299    .682122     6.28   0.000     2.948208    5.622391
       Poor       3.21291   .9070098     3.54   0.000     1.434995    4.990825
                                                                              
    diabetes                                                                  
   Diabetic      -7.50662   4.956266    -1.51   0.130    -17.22186    2.208621
         age      .601485   .0127405    47.21   0.000     .5765111    .6264589
                                                                              
    diabetes#                                                                  
       c.age                                                                  
   Diabetic      .2399378   .0804389     2.98   0.003      .082262    .3976136
                                                                              
       _cons     100.1203   .6583324   152.08   0.000     98.82986    101.4108

We can complete step four by storing the parameter estimates from this reduced model in memory.

. estimates store reduced

In step five, we use lrtest to calculate a likelihood-ratio test comparing the reduced model to the full model.

. lrtest full reduced

Likelihood-ratio test
Assumption: reduced nested within full

 LR chi2(1) =  42.42
Prob > chi2 = 0.0000

The output reports a test statistic labeled LR chi2(1) and a p-value labeled Prob > chi2. The test statistic is our likelihood-ratio chi-squared and equals 42.42. The p-value is calculated using a one-degrees-of-freedom chi-squared distribution and equals 0.0000. What does this mean?

Our full model included a coefficient for age-squared while our reduced model did not. So our likelihood-ratio test is testing the null hypothesis that the age-squared coefficient equals 0. Our test includes one coefficient, so the test has one degrees of freedom. The large chi-squared statistic and small p-value tell us that our result is not consistent with the null hypothesis.

Our likelihood-ratio test required two assumptions. The first assumption is that the reduced model is nested within the full model. This means that all the coefficients in the reduced model are also in the full model. For example, our full model could include covariates such as x1, x2, x3, x4, and x5, and our reduced model could include x1, x2, and x3. But our reduced model may not include a covariate named x6 because x6 is not included in the full model.

The second assumption requires us to use the same sample for the full and the reduced models. Stata's regression commands require observations to have nonmissing data for every variable included in a model. This can result in different sample sizes for the full and reduced models. Let's look at an example with this issue and see how to deal with it. Our full model above was fit using 10,335 observations. Let's fit a reduced model that omits the variable hlthstat.

. regress bpsystol i.diabetes##c.age c.age#c.age


      Source         SS           df       MS     Number of obs   =    10,349
      F(4, 10344)     =    817.53
       Model    1353111.75         4  338277.939    Prob > F        =    0.0000
    Residual    4280168.29    10,344  413.782704    R-squared       =    0.2402
      Adj R-squared   =    0.2399
       Total    5633280.05    10,348   544.38346    Root MSE        =    20.342




    bpsystol   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
   
    diabetes                                                                  
   Diabetic     -.8886553   4.994811    -0.18   0.859    -10.67945    8.902141
         age     .0640567   .0865028     0.74   0.459    -.1055054    .2336188
                                                                              
    diabetes#                                                                  
       c.age                                                                  
   Diabetic      .1403559   .0813022     1.73   0.084    -.0190122    .2997239
                                                                              
 c.age#c.age     .0061116   .0009246     6.61   0.000     .0042992    .0079239
                                                                              
       _cons      111.823   1.812009    61.71   0.000     108.2711    115.3749


. estimates store reduced

Our reduced model was fit using 10,349 observations. When we run lrtest, we will get an error message telling us that the sample sizes differ.

. lrtest full reduced
observations differ: 10335 vs. 10349
r(498);

We can force the reduced model to use the same sample as the full model by using the if e(sample) qualifier in the reduced model.

. regress bpsystol i.hlthstat i.diabetes##c.age c.age#c.age


      Source         SS           df       MS     Number of obs   =    10,335
      F(8, 10326)     =    415.86
       Model    1371889.87         8  171486.233    Prob > F        =    0.0000
    Residual    4258083.48    10,326  412.365242    R-squared       =    0.2437
      Adj R-squared   =    0.2431
       Total    5629973.35    10,334  544.800982    Root MSE        =    20.307




    bpsystol   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
   
    hlthstat                                                                  
  Very good       .829615    .576469     1.44   0.150    -.3003759    1.959606
       Good      2.438839   .5703592     4.28   0.000     1.320825    3.556854
       Fair      4.179397   .6809503     6.14   0.000     2.844602    5.514191
       Poor      3.100577    .905358     3.42   0.001       1.3259    4.875255
                                                                              
    diabetes                                                                  
   Diabetic     -2.789364   4.999021    -0.56   0.577    -12.58841    7.009687
         age     .0436002   .0865406     0.50   0.614    -.1260361    .2132365
                                                                              
    diabetes#                                                                  
       c.age                                                                  
   Diabetic       .158519   .0812441     1.95   0.051    -.0007352    .3177732
                                                                              
 c.age#c.age     .0060262   .0009247     6.52   0.000     .0042137    .0078387
                                                                              
       _cons      111.268   1.832332    60.72   0.000     107.6763    114.8597


. estimates store full

. regress bpsystol i.diabetes##c.age c.age#c.age if e(sample)


      Source         SS           df       MS     Number of obs   =    10,335
      F(4, 10330)     =    816.84
       Model    1352846.11         4  338211.527    Prob > F        =    0.0000
    Residual    4277127.24    10,330  414.049104    R-squared       =    0.2403
      Adj R-squared   =    0.2400
       Total    5629973.35    10,334  544.800982    Root MSE        =    20.348




    bpsystol   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
   
    diabetes                                                                  
   Diabetic      -.889083   4.996564    -0.18   0.859    -10.68332    8.905151
         age      .065459   .0865797     0.76   0.450    -.1042539     .235172
                                                                              
    diabetes#                                                                  
       c.age                                                                  
   Diabetic      .1401007   .0813319     1.72   0.085    -.0193256     .299527
                                                                              
 c.age#c.age     .0061008   .0009255     6.59   0.000     .0042867    .0079149
                                                                              
       _cons      111.795   1.813353    61.65   0.000     108.2404    115.3495


. estimates store reduced

. lrtest full reduced

Likelihood-ratio test
Assumption: reduced nested within full

 LR chi2(4) =  46.12
Prob > chi2 = 0.0000

The null hypothesis for this likelihood-ratio test is that all the hlthstat coefficients are simultaneously equal to 0. The large chi-squared statistic and small p-value suggest that our results are inconsistent with the null hypothesis.

You can read more about factor-variable notation, storing estimates, likelihood-ratio tests, and the lrtest command by clicking on the links to the manual entries below. You can also watch a demonstration of these commands on YouTube by clicking on the link below.

See it in action

Watch Likelihood-ratio tests in Stata.

Tell me more

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies

Variable		Obs Mean Std. dev. Min Max

bpsystol		10,351 130.8817 23.33265 65 300
diabetes		10,349 .0482172 .2142353 0 1
hlthstat		10,335 2.586164 1.206196 1 5
age		10,351 47.57965 17.21483 20 74

Source	SS df MS	Number of obs = 10,335
		F(8, 10326) = 415.86
Model	1371889.87 8 171486.233	Prob > F = 0.0000
Residual	4258083.48 10,326 412.365242	R-squared = 0.2437
		Adj R-squared = 0.2431
Total	5629973.35 10,334 544.800982	Root MSE = 20.307


bpsystol		Coefficient Std. err. t P>\|t\| [95% conf. interval]

hlthstat
Very good		.829615 .576469 1.44 0.150 -.3003759 1.959606
Good		2.438839 .5703592 4.28 0.000 1.320825 3.556854
Fair		4.179397 .6809503 6.14 0.000 2.844602 5.514191
Poor		3.100577 .905358 3.42 0.001 1.3259 4.875255

diabetes
Diabetic		-2.789364 4.999021 -0.56 0.577 -12.58841 7.009687
age		.0436002 .0865406 0.50 0.614 -.1260361 .2132365

diabetes#
c.age
Diabetic		.158519 .0812441 1.95 0.051 -.0007352 .3177732

c.age#c.age		.0060262 .0009247 6.52 0.000 .0042137 .0078387

_cons		111.268 1.832332 60.72 0.000 107.6763 114.8597

Source	SS df MS	Number of obs = 10,349
		F(4, 10344) = 817.53
Model	1353111.75 4 338277.939	Prob > F = 0.0000
Residual	4280168.29 10,344 413.782704	R-squared = 0.2402
		Adj R-squared = 0.2399
Total	5633280.05 10,348 544.38346	Root MSE = 20.342