Title | Keeping all levels of a variable in the model | |

Author | Kenneth Higbee, StataCorp | |

Date | August 2009 |

In the following example, we use
regress as
our estimation command, but the same thing applies to other estimation
commands that have a **noconstant** option.

You might try

. sysuse auto, clear(1978 Automobile Data). regress mpg i.rep78, noconstant

Source | SS df MS | Number of obs = 69 | |

F( 4, 65) = 188.12 | |||

Model | 30942.2129 4 7735.55322 | Prob > F = 0.0000 | |

Residual | 2672.78712 65 41.1198019 | R-squared = 0.9205 | |

Adj R-squared = 0.9156 | |||

Total | 33615 69 487.173913 | Root MSE = 6.4125 |

mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] | ||||

rep78 | td> | ||||

2 | 19.125 2.267151 8.44 0.000 14.59719 23.65281 | ||||

3 | 19.43333 1.170752 16.60 0.000 17.09518 21.77149 | ||||

4 | 21.66667 1.511434 14.34 0.000 18.64812 24.68521 | ||||

5 | 27.36364 1.933433 14.15 0.000 23.5023 31.22497 |

and then wonder why the first level of **rep78** does not appear in your
regression table. If you add the **baselevels** option to your regression
command, you will see that the first level is considered a base level and has
been omitted from the model.

. regress mpg i.rep78, noconstant baselevels

Source | SS df MS | Number of obs = 69 | |

F( 4, 65) = 188.12 | |||

Model | 30942.2129 4 7735.55322 | Prob > F = 0.0000 | |

Residual | 2672.78712 65 41.1198019 | R-squared = 0.9205 | |

Adj R-squared = 0.9156 | |||

Total | 33615 69 487.173913 | Root MSE = 6.4125 |

mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] | |

rep78 | ||

1 | (base) | |

2 | 19.125 2.267151 8.44 0.000 14.59719 23.65281 | |

3 | 19.43333 1.170752 16.60 0.000 17.09518 21.77149 | |

4 | 21.66667 1.511434 14.34 0.000 18.64812 24.68521 | |

5 | 27.36364 1.933433 14.15 0.000 23.5023 31.22497 | |

The **ibn.** factor-variable operator specifies that a categorical variable
should be treated as if it has no base, or, in other words, that all levels of
the categorical variable are to be included in the model; see
[U] **11.4.3 Factor variables**.

What happens when you specify that **rep78** should have no base level but
leave the constant in the model?

. regress mpg ibn.rep78note: 5.rep78 omitted because of collinearity

Source | SS df MS | Number of obs = 69 | |

F( 4, 64) = 4.91 | |||

Model | 549.415777 4 137.353944 | Prob > F = 0.0016 | |

Residual | 1790.78712 64 27.9810488 | R-squared = 0.2348 | |

Adj R-squared = 0.1869 | |||

Total | 2340.2029 68 34.4147485 | Root MSE = 5.2897 |

mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] | |

rep78 | ||

1 | -6.363636 4.066234 -1.56 0.123 -14.48687 1.759599 | |

2 | -8.238636 2.457918 -3.35 0.001 -13.14889 -3.32838 | |

3 | -7.930303 1.86452 -4.25 0.000 -11.65511 -4.205497 | |

4 | -5.69697 2.02441 -2.81 0.006 -9.741193 -1.652747 | |

5 | (omitted) | |

_cons | 27.36364 1.594908 17.16 0.000 24.17744 30.54983 | |

One of the levels of **rep78** is omitted from the model despite your
request that there be no base level for **rep78**. If you have the
constant and all levels of a categorical variable in a model, something must
be dropped because of the collinearity between all the levels and the
constant.

You need to use the **ibn.** operator on your categorical variable and the
**noconstant** option on your estimation command to obtain a cell means
model.

. regress mpg ibn.rep78, noconstant

Source | SS df MS | Number of obs = 69 | |

F( 5, 64) = 227.47 | |||

Model | 31824.2129 5 6364.84258 | Prob > F = 0.0000 | |

Residual | 1790.78712 64 27.9810488 | R-squared = 0.9467 | |

Adj R-squared = 0.9426 | |||

Total | 33615 69 487.173913 | Root MSE = 5.2897 |

mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] | |

rep78 | ||

1 | 21 3.740391 5.61 0.000 13.52771 28.47229 | |

2 | 19.125 1.870195 10.23 0.000 15.38886 22.86114 | |

3 | 19.43333 .9657648 20.12 0.000 17.504 21.36267 | |

4 | 21.66667 1.246797 17.38 0.000 19.1759 24.15743 | |

5 | 27.36364 1.594908 17.16 0.000 24.17744 30.54983 | |