Why do estimation commands sometimes omit variables?
| Title | |
Estimation commands and omitted variables |
| Author |
James Hardin, StataCorp |
| Date |
August 1996; minor revisions July 2011 |
When you run a regression (or other estimation command) and the estimation
routine omits a variable, it does so because of a dependency among the
independent variables in the proposed model. You can identify this
dependency by running a regression where you specify the omitted variable as
the dependent variable and the remaining variables as the independent
variables. Below we generate a dependency on purpose to illustrate:
. sysuse auto
(1978 Automobile Data)
. generate newvar = price + 2.4*weight - 1.2*displ
. regress trunk price weight mpg foreign newvar displ
note: weight omitted because of collinearity
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 5, 68) = 12.03
Model | 626.913967 5 125.382793 Prob > F = 0.0000
Residual | 708.707655 68 10.4221714 R-squared = 0.4694
-------------+------------------------------ Adj R-squared = 0.4304
Total | 1335.62162 73 18.2961866 Root MSE = 3.2283
------------------------------------------------------------------------------
trunk | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
price | -.0017329 .0006706 -2.58 0.012 -.0030711 -.0003947
weight | 0 (omitted)
mpg | -.0709254 .1125374 -0.63 0.531 -.2954903 .1536395
foreign | 1.374419 1.287406 1.07 0.289 -1.194561 3.943399
newvar | .0015145 .0005881 2.58 0.012 .0003411 .002688
displacement | .007182 .0092692 0.77 0.441 -.0113143 .0256783
_cons | 4.170958 5.277511 0.79 0.432 -6.360151 14.70207
------------------------------------------------------------------------------
The regression omitted one of the variables that was in the dependency that
we created. Which variable it omits is somewhat arbitrary, but it will always
omit one of the variables in the dependency. To find out what that
dependency is, we can run the regression using the omitted variable as our
dependent variable and the remaining independent variables from the original
regression as the independent variables in this regression.
. regress weight price mpg foreign newvar displ
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 5, 68) = .
Model | 44094178.4 5 8818835.68 Prob > F = 0.0000
Residual | 6.9847e-07 68 1.0272e-08 R-squared = 1.0000
-------------+------------------------------ Adj R-squared = 1.0000
Total | 44094178.4 73 604029.841 Root MSE = .0001
------------------------------------------------------------------------------
weight | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
price | -.4166667 2.11e-08 -2.0e+07 0.000 -.4166667 -.4166667
mpg | 4.40e-06 3.53e-06 1.25 0.217 -2.65e-06 .0000115
foreign | .000041 .0000404 1.02 0.314 -.0000396 .0001217
newvar | .4166667 1.85e-08 2.3e+07 0.000 .4166667 .4166667
displacement | .4999999 2.91e-07 1.7e+06 0.000 .4999993 .5000005
_cons | -.0002082 .0001657 -1.26 0.213 -.0005388 .0001224
------------------------------------------------------------------------------
The regression that we ran where the omitted variable was the dependent
variable has an R-squared value of 1.00 and the residual sum of squares is
zero (well, nearly). Also, the coefficients of the regression show the
relationship between the price,
newvar, and
displ variables. The output of this
regression tells us that we have the dependency
weight = -.4166667*price + .4166667*newvar + .4999999*displacement
which is equivalent to the dependency that we defined above.
|