Home  /  Resources & Support  /  Introduction to Stata basics  /  Likelihood-ratio tests using lrtest

Stata's lrtest command is a postestimation tool for conducting likelihood-ratio tests after fitting regression models. We will use linear regression below, but the same principles and syntax work with nearly all of Stata's regression commands, including probit, logistic, poisson, and others. You will want to review Stata's factor-variable notation if you have not used it before.

Let's begin by opening the nhanes2l dataset. Then let's describe and summarize the variables bpsystol, diabetes, hlthstat, and age.

. webuse nhanes2l
(Second National Health and Nutrition Examination Survey)

. describe bpsystol diabetes hlthstat age
Variable Storage Display Value
name type format label Variable label
bpsystol int %9.0g Systolic blood pressure
diabetes byte %12.0g diabetes Diabetes status
hlthstat byte %20.0g hlth Health status
age byte %9.0g Age (years)
. summarize bpsystol diabetes hlthstat age
Variable Obs Mean Std. dev. Min Max
bpsystol 10,351 130.8817 23.33265 65 300
diabetes 10,349 .0482172 .2142353 0 1
hlthstat 10,335 2.586164 1.206196 1 5
age 10,351 47.57965 17.21483 20 74

bpsystol measures systolic blood pressure (SBP) with a range of 65 to 300 mmHg; diabetes is an indicator variable for diabetes status with values of 0 to 1; hlthstat is a categorical variable with five categories of health status; and age measures age with a range of 20 to 74 years.

We are going to fit a series of linear regression models for the outcome variable bpsystol. Likelihood-ratio tests allow us to test hypotheses about one or more coefficients in a regression model. The test usually involves the following five steps:

  1. Fit a “full” regression model.

  2. Store the parameter estimates from the full model by using estimates store.

  3. Fit a “reduced” regression model.

  4. Store the parameter estimates from the reduced model by using estimates store.

  5. Conduct the likelihood-ratio test using lrtest.

Let's run an example to see how it works. In step one, our full model is a linear regression model using the continuous outcome variable bpsystol and the predictor variables diabetes, hlthstat, and age. We use factor-variable notation to tell Stata that diabetes and hlthstat are categorical predictors and age is a continuous predictor. We also use the interaction operator ## to request the main effects of diabetes and age along with their interaction. And we use the # operator to request the interaction of age with itself, which is equivalent to the square of age.

. regress bpsystol i.hlthstat i.diabetes##c.age c.age#c.age

Source SS df MS Number of obs = 10,335
F(8, 10326) = 415.86
Model 1371889.87 8 171486.233 Prob > F = 0.0000
Residual 4258083.48 10,326 412.365242 R-squared = 0.2437
Adj R-squared = 0.2431
Total 5629973.35 10,334 544.800982 Root MSE = 20.307
bpsystol Coefficient Std. err. t P>|t| [95% conf. interval]
hlthstat
Very good .829615 .576469 1.44 0.150 -.3003759 1.959606
Good 2.438839 .5703592 4.28 0.000 1.320825 3.556854
Fair 4.179397 .6809503 6.14 0.000 2.844602 5.514191
Poor 3.100577 .905358 3.42 0.001 1.3259 4.875255
diabetes
Diabetic -2.789364 4.999021 -0.56 0.577 -12.58841 7.009687
age .0436002 .0865406 0.50 0.614 -.1260361 .2132365
diabetes#
c.age
Diabetic .158519 .0812441 1.95 0.051 -.0007352 .3177732
c.age#c.age .0060262 .0009247 6.52 0.000 .0042137 .0078387
_cons 111.268 1.832332 60.72 0.000 107.6763 114.8597

In step two, we use estimates store to temporarily store the parameter estimates in memory. Let's name our estimates full.

. estimates store full

The output includes a Wald test for the null hypothesis that the age-squared coefficient, labeled c.age#c.age, equals 0. The t statistic equals 6.52, and the p-value equals 0.000. Let's test this same hypothesis with a likelihood-ratio test instead of a Wald test. To do so, in step 3 the reduced model that we fit is the same as the model above but without the age-squared term.

. regress bpsystol i.hlthstat i.diabetes##c.age

Source SS df MS Number of obs = 10,335
F(7, 10327) = 467.32
Model 1354374.9 7 193482.129 Prob > F = 0.0000
Residual 4275598.45 10,327 414.021347 R-squared = 0.2406
Adj R-squared = 0.2401
Total 5629973.35 10,334 544.800982 Root MSE = 20.348
bpsystol Coefficient Std. err. t P>|t| [95% conf. interval]
hlthstat
Very good .8834185 .5775661 1.53 0.126 -.248723 2.01556
Good 2.377764 .5714262 4.16 0.000 1.257658 3.49787
Fair 4.285299 .682122 6.28 0.000 2.948208 5.622391
Poor 3.21291 .9070098 3.54 0.000 1.434995 4.990825
diabetes
Diabetic -7.50662 4.956266 -1.51 0.130 -17.22186 2.208621
age .601485 .0127405 47.21 0.000 .5765111 .6264589
diabetes#
c.age
Diabetic .2399378 .0804389 2.98 0.003 .082262 .3976136
_cons 100.1203 .6583324 152.08 0.000 98.82986 101.4108

We can complete step four by storing the parameter estimates from this reduced model in memory.

. estimates store reduced

In step five, we use lrtest to calculate a likelihood-ratio test comparing the reduced model to the full model.

. lrtest full reduced

Likelihood-ratio test
Assumption: reduced nested within full

 LR chi2(1) =  42.42
Prob > chi2 = 0.0000

The output reports a test statistic labeled LR chi2(1) and a p-value labeled Prob > chi2. The test statistic is our likelihood-ratio chi-squared and equals 42.42. The p-value is calculated using a one-degrees-of-freedom chi-squared distribution and equals 0.0000. What does this mean?

Our full model included a coefficient for age-squared while our reduced model did not. So our likelihood-ratio test is testing the null hypothesis that the age-squared coefficient equals 0. Our test includes one coefficient, so the test has one degrees of freedom. The large chi-squared statistic and small p-value tell us that our result is not consistent with the null hypothesis.

Our likelihood-ratio test required two assumptions. The first assumption is that the reduced model is nested within the full model. This means that all the coefficients in the reduced model are also in the full model. For example, our full model could include covariates such as x1, x2, x3, x4, and x5, and our reduced model could include x1, x2, and x3. But our reduced model may not include a covariate named x6 because x6 is not included in the full model.

The second assumption requires us to use the same sample for the full and the reduced models. Stata's regression commands require observations to have nonmissing data for every variable included in a model. This can result in different sample sizes for the full and reduced models. Let's look at an example with this issue and see how to deal with it. Our full model above was fit using 10,335 observations. Let's fit a reduced model that omits the variable hlthstat.

. regress bpsystol i.diabetes##c.age c.age#c.age

Source SS df MS Number of obs = 10,349
F(4, 10344) = 817.53
Model 1353111.75 4 338277.939 Prob > F = 0.0000
Residual 4280168.29 10,344 413.782704 R-squared = 0.2402
Adj R-squared = 0.2399
Total 5633280.05 10,348 544.38346 Root MSE = 20.342
bpsystol Coefficient Std. err. t P>|t| [95% conf. interval]
diabetes
Diabetic -.8886553 4.994811 -0.18 0.859 -10.67945 8.902141
age .0640567 .0865028 0.74 0.459 -.1055054 .2336188
diabetes#
c.age
Diabetic .1403559 .0813022 1.73 0.084 -.0190122 .2997239
c.age#c.age .0061116 .0009246 6.61 0.000 .0042992 .0079239
_cons 111.823 1.812009 61.71 0.000 108.2711 115.3749
. estimates store reduced

Our reduced model was fit using 10,349 observations. When we run lrtest, we will get an error message telling us that the sample sizes differ.

. lrtest full reduced
observations differ: 10335 vs. 10349
r(498);

We can force the reduced model to use the same sample as the full model by using the if e(sample) qualifier in the reduced model.

. regress bpsystol i.hlthstat i.diabetes##c.age c.age#c.age

Source SS df MS Number of obs = 10,335
F(8, 10326) = 415.86
Model 1371889.87 8 171486.233 Prob > F = 0.0000
Residual 4258083.48 10,326 412.365242 R-squared = 0.2437
Adj R-squared = 0.2431
Total 5629973.35 10,334 544.800982 Root MSE = 20.307
bpsystol Coefficient Std. err. t P>|t| [95% conf. interval]
hlthstat
Very good .829615 .576469 1.44 0.150 -.3003759 1.959606
Good 2.438839 .5703592 4.28 0.000 1.320825 3.556854
Fair 4.179397 .6809503 6.14 0.000 2.844602 5.514191
Poor 3.100577 .905358 3.42 0.001 1.3259 4.875255
diabetes
Diabetic -2.789364 4.999021 -0.56 0.577 -12.58841 7.009687
age .0436002 .0865406 0.50 0.614 -.1260361 .2132365
diabetes#
c.age
Diabetic .158519 .0812441 1.95 0.051 -.0007352 .3177732
c.age#c.age .0060262 .0009247 6.52 0.000 .0042137 .0078387
_cons 111.268 1.832332 60.72 0.000 107.6763 114.8597
. estimates store full . regress bpsystol i.diabetes##c.age c.age#c.age if e(sample)
Source SS df MS Number of obs = 10,335
F(4, 10330) = 816.84
Model 1352846.11 4 338211.527 Prob > F = 0.0000
Residual 4277127.24 10,330 414.049104 R-squared = 0.2403
Adj R-squared = 0.2400
Total 5629973.35 10,334 544.800982 Root MSE = 20.348
bpsystol Coefficient Std. err. t P>|t| [95% conf. interval]
diabetes
Diabetic -.889083 4.996564 -0.18 0.859 -10.68332 8.905151
age .065459 .0865797 0.76 0.450 -.1042539 .235172
diabetes#
c.age
Diabetic .1401007 .0813319 1.72 0.085 -.0193256 .299527
c.age#c.age .0061008 .0009255 6.59 0.000 .0042867 .0079149
_cons 111.795 1.813353 61.65 0.000 108.2404 115.3495
. estimates store reduced . lrtest full reduced Likelihood-ratio test Assumption: reduced nested within full LR chi2(4) = 46.12 Prob > chi2 = 0.0000

The null hypothesis for this likelihood-ratio test is that all the hlthstat coefficients are simultaneously equal to 0. The large chi-squared statistic and small p-value suggest that our results are inconsistent with the null hypothesis.

You can read more about factor-variable notation, storing estimates, likelihood-ratio tests, and the lrtest command by clicking on the links to the manual entries below. You can also watch a demonstration of these commands on YouTube by clicking on the link below.

See it in action

Watch Likelihood-ratio tests in Stata.