»  Home »  Resources & support »  FAQs »  Estimation commands and omitted variables

## Why do estimation commands sometimes omit variables?

 Title Estimation commands and omitted variables Author James Hardin, StataCorp

When you run a regression (or other estimation command) and the estimation routine omits a variable, it does so because of a dependency among the independent variables in the proposed model. You can identify this dependency by running a regression where you specify the omitted variable as the dependent variable and the remaining variables as the independent variables. Below we generate a dependency on purpose to illustrate:

. sysuse auto
(1978 Automobile Data)

. generate newvar = price + 2.4*weight - 1.2*displ

. regress trunk price weight mpg foreign newvar displ
note: weight omitted because of collinearity

Source         SS       df       MS              Number of obs =      74

F(  5,    68) =   12.03

Model    626.913967     5  125.382793           Prob > F      =  0.0000

Residual    708.707655    68  10.4221714           R-squared     =  0.4694

Total    1335.62162    73  18.2961866           Root MSE      =  3.2283

trunk        Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

price    -.0017329   .0006706    -2.58   0.012    -.0030711   -.0003947

weight            0  (omitted)

mpg    -.0709254   .1125374    -0.63   0.531    -.2954903    .1536395

foreign     1.374419   1.287406     1.07   0.289    -1.194561    3.943399

newvar     .0015145   .0005881     2.58   0.012     .0003411     .002688

displacement      .007182   .0092692     0.77   0.441    -.0113143    .0256783

_cons     4.170958   5.277511     0.79   0.432    -6.360151    14.70207



The regression omitted one of the variables that was in the dependency that we created. Which variable it omits is somewhat arbitrary, but it will always omit one of the variables in the dependency. To find out what that dependency is, we can run the regression using the omitted variable as our dependent variable and the remaining independent variables from the original regression as the independent variables in this regression.

. regress weight price mpg foreign newvar displ

Source         SS       df       MS              Number of obs =      74

F(  5,    68) =       .

Model    44094178.4     5  8818835.68           Prob > F      =  0.0000

Residual    6.9847e-07    68  1.0272e-08           R-squared     =  1.0000

Total    44094178.4    73  604029.841           Root MSE      =   .0001

weight        Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

price    -.4166667   2.11e-08 -2.0e+07   0.000    -.4166667   -.4166667

mpg     4.40e-06   3.53e-06     1.25   0.217    -2.65e-06    .0000115

foreign      .000041   .0000404     1.02   0.314    -.0000396    .0001217

newvar     .4166667   1.85e-08  2.3e+07   0.000     .4166667    .4166667

displacement     .4999999   2.91e-07  1.7e+06   0.000     .4999993    .5000005

_cons    -.0002082   .0001657    -1.26   0.213    -.0005388    .0001224



The regression that we ran where the omitted variable was the dependent variable has an R-squared value of 1.00 and the residual sum of squares is zero (well, nearly). Also, the coefficients of the regression show the relationship between the price, newvar, and displ variables. The output of this regression tells us that we have the dependency

weight = -.4166667*price + .4166667*newvar + .4999999*displacement


which is equivalent to the dependency that we defined above.