Home  /  Resources & support  /  FAQs  /  Estimation commands and omitted variables

Why do estimation commands sometimes omit variables?

Title   Estimation commands and omitted variables
Author James Hardin, StataCorp

When you run a regression (or other estimation command) and the estimation routine omits a variable, it does so because of a dependency among the independent variables in the proposed model. You can identify this dependency by running a regression where you specify the omitted variable as the dependent variable and the remaining variables as the independent variables. Below we generate a dependency on purpose to illustrate:

. sysuse auto
(1978 automobile data)

. generate newvar = price + 2.4*weight - 1.2*displ

.  regress trunk price weight mpg foreign newvar displ
note: weight omitted because of collinearity.

Source SS df MS Number of obs = 74
F(5, 68) = 12.03
Model 626.913967 5 125.382793 Prob > F = 0.0000
Residual 708.707655 68 10.4221714 R-squared = 0.4694
Adj R-squared = 0.4304
Total 1335.62162 73 18.2961866 Root MSE = 3.2283
trunk Coefficient Std. err. t P>|t| [95% conf. interval]
price -.0017329 .0006706 -2.58 0.012 -.0030711 -.0003947
weight 0 (omitted)
mpg -.0709254 .1125374 -0.63 0.531 -.2954903 .1536395
foreign 1.374419 1.287406 1.07 0.289 -1.194561 3.943399
newvar .0015145 .0005881 2.58 0.012 .0003411 .002688
displacement .007182 .0092692 0.77 0.441 -.0113143 .0256783
_cons 4.170958 5.277511 0.79 0.432 -6.360151 14.70207

The regression omitted one of the variables that was in the dependency that we created. Which variable it omits is somewhat arbitrary, but it will always omit one of the variables in the dependency. To find out what that dependency is, we can run the regression using the omitted variable as our dependent variable and the remaining independent variables from the original regression as the independent variables in this regression.

. regress weight price mpg foreign newvar displ

Source SS df MS Number of obs = 74
F(5, 68) > 99999.00
Model 44094178.4 5 8818835.68 Prob > F = 0.0000
Residual 6.9847e-07 68 1.0272e-08 R-squared = 1.0000
Adj R-squared = 1.0000
Total 44094178.4 73 604029.841 Root MSE = .0001
weight Coefficient Std. err. t P>|t| [95% conf. interval]
price -.4166667 2.11e-08 -2.0e+07 0.000 -.4166667 -.4166667
mpg 4.40e-06 3.53e-06 1.25 0.217 -2.65e-06 .0000115
foreign .000041 .0000404 1.02 0.314 -.0000396 .0001217
newvar .4166667 1.85e-08 2.3e+07 0.000 .4166667 .4166667
displacement .4999999 2.91e-07 1.7e+06 0.000 .4999993 .5000005
_cons -.0002082 .0001657 -1.26 0.213 -.0005388 .0001224

The regression that we ran where the omitted variable was the dependent variable has an R-squared value of 1.00 and the residual sum of squares is zero (well, nearly). Also, the coefficients of the regression show the relationship between the price, newvar, and displ variables. The output of this regression tells us that we have the dependency

weight = -.4166667*price + .4166667*newvar + .4999999*displacement 

which is equivalent to the dependency that we defined above.