Stata's test, testparm, and testnl commands are powerful postestimation tools for conducting Wald tests after fitting regression models. We will use linear regression below, but the same principles and syntax work with nearly all of Stata's regression commands, including probit, logistic, poisson, and others. You will want to review Stata's factor-variable notation if you have not used it before.
Let's begin by opening the nhanes2l dataset. Then let's describe and summarize the variables bpsystol, diabetes, hlthstat, and age.
. webuse nhanes2l (Second National Health and Nutrition Examination Survey) . describe bpsystol diabetes hlthstat age
Variable Storage Display Value |
name type format label Variable label |
bpsystol int %9.0g Systolic blood pressure |
diabetes byte %12.0g diabetes Diabetes status |
hlthstat byte %20.0g hlth Health status |
age byte %9.0g Age (years) |
Variable | Obs Mean Std. dev. Min Max | |
bpsystol | 10,351 130.8817 23.33265 65 300 | |
diabetes | 10,349 .0482172 .2142353 0 1 | |
hlthstat | 10,335 2.586164 1.206196 1 5 | |
age | 10,351 47.57965 17.21483 20 74 |
bpsystol measures systolic blood pressure (SBP) with a range of 65 to 300 mmHg; diabetes is an indicator variable for diabetes status with values of 0 to 1; hlthstat is a categorical variable with five categories of health status; and age measures age with a range of 20 to 74 years.
Let's first fit a linear regression model using the continuous outcome variable bpsystol and the predictor variables diabetes, hlthstat, and age. We use factor-variable notation to tell Stata that diabetes and hlthstat are categorical predictors and age is a continuous predictor, and we use the interaction operators # and ## to include main effects and the interaction of diabetes and age as well as age-squared.
. regress bpsystol i.diabetes##c.age c.age#c.age i.hlthstat
Source | SS df MS | Number of obs = 10,335 | F(8, 10326) = 415.86 |
Model | 1371889.87 8 171486.233 | Prob > F = 0.0000 | |
Residual | 4258083.48 10,326 412.365242 | R-squared = 0.2437 | Adj R-squared = 0.2431 |
Total | 5629973.35 10,334 544.800982 | Root MSE = 20.307 |
bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |
diabetes | ||
Diabetic | -2.789364 4.999021 -0.56 0.577 -12.58841 7.009687 | |
age | .0436002 .0865406 0.50 0.614 -.1260361 .2132365 | |
diabetes# | ||
c.age | ||
Diabetic | .158519 .0812441 1.95 0.051 -.0007352 .3177732 | |
c.age#c.age | .0060262 .0009247 6.52 0.000 .0042137 .0078387 | |
hlthstat | ||
Very good | .829615 .576469 1.44 0.150 -.3003759 1.959606 | |
Good | 2.438839 .5703592 4.28 0.000 1.320825 3.556854 | |
Fair | 4.179397 .6809503 6.14 0.000 2.844602 5.514191 | |
Poor | 3.100577 .905358 3.42 0.001 1.3259 4.875255 | |
_cons | 111.268 1.832332 60.72 0.000 107.6763 114.8597 | |
The output includes a Wald test for the null hypothesis that the age coefficient equals 0. While this might not be a very interesting hypothesis on its own because of the interaction term and squared term for age in the model, it is a good test to focus on as we explore the basic syntax for performing Wald tests using the test command. The t statistic equals 0.50 and the p-value equals 0.614. We can replicate this Wald test by using the test command.
. test age = 0 ( 1) age = 0 F( 1, 10326) = 0.25 Prob > F = 0.6144
The test output reports an F statistic of 0.25 and a p-value of 0.6144. Note that regress reports a t statistic and test reports an F statistic, but the p-values are essentially the same. Statistical theory tells us that the square of a random variable with a t distribution has an F distribution with one numerator degrees of freedom and the same denominator degrees of freedom as the t statistic. So, the square of the t statistic is equivalent to the F statistic.
. display "The square of the t statistic is " 0.5^2 The square of the t statistic is .25
We could use test to compute a Wald test for the null hypothesis that the age coefficient equals 0.05 rather than 0.
. test age = 0.05 ( 1) age = .05 F( 1, 10326) = 0.01 Prob > F = 0.9411
Our model includes coefficients for both age and the square of age. We can use test to conduct a Wald test of the null hypothesis that both the age and the age-squared coefficients are simultaneously equal to 0. We can refer to the coefficients by using the _b[] notation. Because we used factor-variable notation with regress, it is easiest to find the names to place inside the brackets by adding the coeflegend option to our regression model.
. regress bpsystol i.diabetes##c.age c.age#c.age i.hlthstat, coeflegend
Source | SS df MS | Number of obs = 10,335 | F(8, 10326) = 415.86 |
Model | 1371889.87 8 171486.233 | Prob > F = 0.0000 | |
Residual | 4258083.48 10,326 412.365242 | R-squared = 0.2437 | Adj R-squared = 0.2431 |
Total | 5629973.35 10,334 544.800982 | Root MSE = 20.307 |
bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |
diabetes | ||
Diabetic | -2.789364 _b[1.diabetes] | |
age | .0436002 _b[age] | |
diabetes# | ||
c.age | ||
Diabetic | .158519 _b[1.diabetes#c.age] | |
c.age#c.age | .0060262 _b[c.age#c.age] | |
hlthstat | ||
Very good | .829615 _b[2.hlthstat] | |
Good | 2.438839 _b[3.hlthstat] | |
Fair | 4.179397 _b[4.hlthstat] | |
Poor | 3.100577 _b[5.hlthstat] | |
_cons | 111.268 _b[_cons] | |
Now we can refer to the coefficients in our test command by using this notation. The individual tests are nested within parentheses.
. test (_b[age] = 0) (_b[c.age#c.age] = 0) ( 1) age = 0 ( 2) c.age#c.age = 0 F( 2, 10326) = 1140.12 Prob > F = 0.0000
The output reminds us that we are testing two hypotheses simultaneously and reports the F statistic and p-value (F(2, 10326) = 1140.12, p-value = 0.0000).
We could have conducted the same test by using the testparm command with factor-variable notation.
. testparm c.age c.age#c.age ( 1) age = 0 ( 2) c.age#c.age = 0 F( 2, 10326) = 1140.12 Prob > F = 0.0000
Given the equivalent results, choosing between test and testparm depends on your goals and the simplicity of the syntax. Both test and testparm will test the null hypothesis that the coefficient or coefficients equal 0, but test allows us to test that the coefficients equal values other than 0 while testparm does not.
The syntax for testparm is often shorter and simpler than the syntax for test. Let's check out this example: There are four coefficients for the hlthstat indicator variables. The t statistics reported in the regression output test that the coefficient for each indicator variable is equal to 0. But we often wish to test the null hypothesis that all four coefficients are simultaneously equal to 0. We could do this with test using the following syntax:
. test _b[2.hlthstat] = _b[3.hlthstat] = _b[4.hlthstat] = _b[5.hlthstat] = 0 ( 1) 2.hlthstat - 3.hlthstat = 0 ( 2) 2.hlthstat - 4.hlthstat = 0 ( 3) 2.hlthstat - 5.hlthstat = 0 ( 4) 2.hlthstat = 0 F( 4, 10326) = 11.55 Prob > F = 0.0000
But, the syntax for the equivalent test using testparm is much shorter and simpler:
. testparm i.hlthstat ( 1) 2.hlthstat = 0 ( 2) 3.hlthstat = 0 ( 3) 4.hlthstat = 0 ( 4) 5.hlthstat = 0 F( 4, 10326) = 11.55 Prob > F = 0.0000
Also easy with testparm is testing that the coefficients equal 0 for multiple variables simultaneously. The example below tests the null hypothesis that the coefficients for hlthstat and diabetes are all simultaneously equal to 0.
. testparm i.hlthstat i.diabetes ( 1) 1.diabetes = 0 ( 2) 2.hlthstat = 0 ( 3) 3.hlthstat = 0 ( 4) 4.hlthstat = 0 ( 5) 5.hlthstat = 0 F( 5, 10326) = 9.24 Prob > F = 0.0000
We can also test hypotheses about linear combinations of coefficients by using test and nonlinear combinations of coefficients by using testnl. For example, we could use test to test that the difference between the coefficients for the “Poor” and “Fair” categories of hlthstat equals 0.
. test _b[5.hlthstat] - _b[4.hlthstat] = 0 ( 1) - 4.hlthstat + 5.hlthstat = 0 F( 1, 10326) = 1.42 Prob > F = 0.2339
Or we could use testnl to test the null hypothesis that the ratio of the coefficients for the “Poor” and “Fair” categories of hlthstat equals 1.
. testnl _b[5.hlthstat] / _b[4.hlthstat] = 1 (1) _b[5.hlthstat] / _b[4.hlthstat] = 1 chi2(1) = 1.59 Prob > chi2 = 0.2073
Note that test reported an F statistic while testnl reported a chi-squared statistic. The numbers are slightly different, but the conclusion about the null hypothesis is the same. You can read more about the relationship between F and chi-squared statistics here: How are the chi-squared and F distributions related?
You can read more about factor-variable notation, test, testparm, and testnl by clicking on the links to the manual entries below. You can also watch a demonstration of these commands on YouTube by clicking on the link below.
Watch Wald tests in Stata.
Read more in the Stata Base Reference Manual: see [R] test, [R] testparm, [R] testnl, and [R] regress. In the Stata User’s Guide, see [U] 13.5 Accessing coefficients and standard errors and [U] 11.4.3 Factor variables.