Stata's lrtest command is a postestimation tool for conducting likelihood-ratio tests after fitting regression models. We will use linear regression below, but the same principles and syntax work with nearly all of Stata's regression commands, including probit, logistic, poisson, and others. You will want to review Stata's factor-variable notation if you have not used it before.
Let's begin by opening the nhanes2l dataset. Then let's describe and summarize the variables bpsystol, diabetes, hlthstat, and age.
. webuse nhanes2l (Second National Health and Nutrition Examination Survey) . describe bpsystol diabetes hlthstat age
Variable Storage Display Value |
name type format label Variable label |
bpsystol int %9.0g Systolic blood pressure |
diabetes byte %12.0g diabetes Diabetes status |
hlthstat byte %20.0g hlth Health status |
age byte %9.0g Age (years) |
Variable | Obs Mean Std. dev. Min Max | |
bpsystol | 10,351 130.8817 23.33265 65 300 | |
diabetes | 10,349 .0482172 .2142353 0 1 | |
hlthstat | 10,335 2.586164 1.206196 1 5 | |
age | 10,351 47.57965 17.21483 20 74 |
bpsystol measures systolic blood pressure (SBP) with a range of 65 to 300 mmHg; diabetes is an indicator variable for diabetes status with values of 0 to 1; hlthstat is a categorical variable with five categories of health status; and age measures age with a range of 20 to 74 years.
We are going to fit a series of linear regression models for the outcome variable bpsystol. Likelihood-ratio tests allow us to test hypotheses about one or more coefficients in a regression model. The test usually involves the following five steps:
Fit a “full” regression model.
Store the parameter estimates from the full model by using estimates store.
Fit a “reduced” regression model.
Store the parameter estimates from the reduced model by using estimates store.
Conduct the likelihood-ratio test using lrtest.
Let's run an example to see how it works. In step one, our full model is a linear regression model using the continuous outcome variable bpsystol and the predictor variables diabetes, hlthstat, and age. We use factor-variable notation to tell Stata that diabetes and hlthstat are categorical predictors and age is a continuous predictor. We also use the interaction operator ## to request the main effects of diabetes and age along with their interaction. And we use the # operator to request the interaction of age with itself, which is equivalent to the square of age.
. regress bpsystol i.hlthstat i.diabetes##c.age c.age#c.age
Source | SS df MS | Number of obs = 10,335 | F(8, 10326) = 415.86 |
Model | 1371889.87 8 171486.233 | Prob > F = 0.0000 | |
Residual | 4258083.48 10,326 412.365242 | R-squared = 0.2437 | Adj R-squared = 0.2431 |
Total | 5629973.35 10,334 544.800982 | Root MSE = 20.307 |
bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |
hlthstat | ||
Very good | .829615 .576469 1.44 0.150 -.3003759 1.959606 | |
Good | 2.438839 .5703592 4.28 0.000 1.320825 3.556854 | |
Fair | 4.179397 .6809503 6.14 0.000 2.844602 5.514191 | |
Poor | 3.100577 .905358 3.42 0.001 1.3259 4.875255 | |
diabetes | ||
Diabetic | -2.789364 4.999021 -0.56 0.577 -12.58841 7.009687 | |
age | .0436002 .0865406 0.50 0.614 -.1260361 .2132365 | |
diabetes# | ||
c.age | ||
Diabetic | .158519 .0812441 1.95 0.051 -.0007352 .3177732 | |
c.age#c.age | .0060262 .0009247 6.52 0.000 .0042137 .0078387 | |
_cons | 111.268 1.832332 60.72 0.000 107.6763 114.8597 | |
In step two, we use estimates store to temporarily store the parameter estimates in memory. Let's name our estimates full.
. estimates store full
The output includes a Wald test for the null hypothesis that the age-squared coefficient, labeled c.age#c.age, equals 0. The t statistic equals 6.52, and the p-value equals 0.000. Let's test this same hypothesis with a likelihood-ratio test instead of a Wald test. To do so, in step 3 the reduced model that we fit is the same as the model above but without the age-squared term.
. regress bpsystol i.hlthstat i.diabetes##c.age
Source | SS df MS | Number of obs = 10,335 | F(7, 10327) = 467.32 |
Model | 1354374.9 7 193482.129 | Prob > F = 0.0000 | |
Residual | 4275598.45 10,327 414.021347 | R-squared = 0.2406 | Adj R-squared = 0.2401 |
Total | 5629973.35 10,334 544.800982 | Root MSE = 20.348 |
bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |
hlthstat | ||
Very good | .8834185 .5775661 1.53 0.126 -.248723 2.01556 | |
Good | 2.377764 .5714262 4.16 0.000 1.257658 3.49787 | |
Fair | 4.285299 .682122 6.28 0.000 2.948208 5.622391 | |
Poor | 3.21291 .9070098 3.54 0.000 1.434995 4.990825 | |
diabetes | ||
Diabetic | -7.50662 4.956266 -1.51 0.130 -17.22186 2.208621 | |
age | .601485 .0127405 47.21 0.000 .5765111 .6264589 | |
diabetes# | ||
c.age | ||
Diabetic | .2399378 .0804389 2.98 0.003 .082262 .3976136 | |
_cons | 100.1203 .6583324 152.08 0.000 98.82986 101.4108 | |
We can complete step four by storing the parameter estimates from this reduced model in memory.
. estimates store reduced
In step five, we use lrtest to calculate a likelihood-ratio test comparing the reduced model to the full model.
. lrtest full reduced Likelihood-ratio test Assumption: reduced nested within full LR chi2(1) = 42.42 Prob > chi2 = 0.0000
The output reports a test statistic labeled LR chi2(1) and a p-value labeled Prob > chi2. The test statistic is our likelihood-ratio chi-squared and equals 42.42. The p-value is calculated using a one-degrees-of-freedom chi-squared distribution and equals 0.0000. What does this mean?
Our full model included a coefficient for age-squared while our reduced model did not. So our likelihood-ratio test is testing the null hypothesis that the age-squared coefficient equals 0. Our test includes one coefficient, so the test has one degrees of freedom. The large chi-squared statistic and small p-value tell us that our result is not consistent with the null hypothesis.
Our likelihood-ratio test required two assumptions. The first assumption is that the reduced model is nested within the full model. This means that all the coefficients in the reduced model are also in the full model. For example, our full model could include covariates such as x1, x2, x3, x4, and x5, and our reduced model could include x1, x2, and x3. But our reduced model may not include a covariate named x6 because x6 is not included in the full model.
The second assumption requires us to use the same sample for the full and the reduced models. Stata's regression commands require observations to have nonmissing data for every variable included in a model. This can result in different sample sizes for the full and reduced models. Let's look at an example with this issue and see how to deal with it. Our full model above was fit using 10,335 observations. Let's fit a reduced model that omits the variable hlthstat.
. regress bpsystol i.diabetes##c.age c.age#c.age
Source | SS df MS | Number of obs = 10,349 | F(4, 10344) = 817.53 |
Model | 1353111.75 4 338277.939 | Prob > F = 0.0000 | |
Residual | 4280168.29 10,344 413.782704 | R-squared = 0.2402 | Adj R-squared = 0.2399 |
Total | 5633280.05 10,348 544.38346 | Root MSE = 20.342 |
bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |
diabetes | ||
Diabetic | -.8886553 4.994811 -0.18 0.859 -10.67945 8.902141 | |
age | .0640567 .0865028 0.74 0.459 -.1055054 .2336188 | |
diabetes# | ||
c.age | ||
Diabetic | .1403559 .0813022 1.73 0.084 -.0190122 .2997239 | |
c.age#c.age | .0061116 .0009246 6.61 0.000 .0042992 .0079239 | |
_cons | 111.823 1.812009 61.71 0.000 108.2711 115.3749 | |
Our reduced model was fit using 10,349 observations. When we run lrtest, we will get an error message telling us that the sample sizes differ.
. lrtest full reduced observations differ: 10335 vs. 10349 r(498);
We can force the reduced model to use the same sample as the full model by using the if e(sample) qualifier in the reduced model.
. regress bpsystol i.hlthstat i.diabetes##c.age c.age#c.age
Source | SS df MS | Number of obs = 10,335 | F(8, 10326) = 415.86 |
Model | 1371889.87 8 171486.233 | Prob > F = 0.0000 | |
Residual | 4258083.48 10,326 412.365242 | R-squared = 0.2437 | Adj R-squared = 0.2431 |
Total | 5629973.35 10,334 544.800982 | Root MSE = 20.307 |
bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |
hlthstat | ||
Very good | .829615 .576469 1.44 0.150 -.3003759 1.959606 | |
Good | 2.438839 .5703592 4.28 0.000 1.320825 3.556854 | |
Fair | 4.179397 .6809503 6.14 0.000 2.844602 5.514191 | |
Poor | 3.100577 .905358 3.42 0.001 1.3259 4.875255 | |
diabetes | ||
Diabetic | -2.789364 4.999021 -0.56 0.577 -12.58841 7.009687 | |
age | .0436002 .0865406 0.50 0.614 -.1260361 .2132365 | |
diabetes# | ||
c.age | ||
Diabetic | .158519 .0812441 1.95 0.051 -.0007352 .3177732 | |
c.age#c.age | .0060262 .0009247 6.52 0.000 .0042137 .0078387 | |
_cons | 111.268 1.832332 60.72 0.000 107.6763 114.8597 | |
Source | SS df MS | Number of obs = 10,335 | F(4, 10330) = 816.84 |
Model | 1352846.11 4 338211.527 | Prob > F = 0.0000 | |
Residual | 4277127.24 10,330 414.049104 | R-squared = 0.2403 | Adj R-squared = 0.2400 |
Total | 5629973.35 10,334 544.800982 | Root MSE = 20.348 |
bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |
diabetes | ||
Diabetic | -.889083 4.996564 -0.18 0.859 -10.68332 8.905151 | |
age | .065459 .0865797 0.76 0.450 -.1042539 .235172 | |
diabetes# | ||
c.age | ||
Diabetic | .1401007 .0813319 1.72 0.085 -.0193256 .299527 | |
c.age#c.age | .0061008 .0009255 6.59 0.000 .0042867 .0079149 | |
_cons | 111.795 1.813353 61.65 0.000 108.2404 115.3495 | |
The null hypothesis for this likelihood-ratio test is that all the hlthstat coefficients are simultaneously equal to 0. The large chi-squared statistic and small p-value suggest that our results are inconsistent with the null hypothesis.
You can read more about factor-variable notation, storing estimates, likelihood-ratio tests, and the lrtest command by clicking on the links to the manual entries below. You can also watch a demonstration of these commands on YouTube by clicking on the link below.
Read more in the Stata Base Reference Manual: see [R] lrtest, [R] estimates store, and [R] regress. In the Stata User’s Guide, see [U] 11.4.3 Factor variables.