How do I keep all levels of my categorical variable in my model?
How do I specify a cell means model?
Title
Keeping all levels of a variable in the model
Author
Kenneth Higbee, StataCorp
In the following example, we use
regress as
our estimation command, but the same thing applies to other estimation
commands that have a noconstant option.
Coefficient Std. err. t P>|t| [95% conf. interval]
rep78
2
19.125 2.267151 8.44 0.000 14.59719 23.65281
3
19.43333 1.170752 16.60 0.000 17.09518 21.77149
4
21.66667 1.511434 14.34 0.000 18.64812 24.68521
5
27.36364 1.933433 14.15 0.000 23.5023 31.22497
and then wonder why the first level of rep78 does not appear in your
regression table. If you add the baselevels option to your regression
command, you will see that the first level is considered a base level and has
been omitted from the model.
. regress mpg i.rep78, noconstant baselevels
Source
SS df MS
Number of obs = 69
F(4, 65) = 188.12
Model
30942.2129 4 7735.55322
Prob > F = 0.0000
Residual
2672.78712 65 41.1198019
R-squared = 0.9205
Adj R-squared = 0.9156
Total
33615 69 487.173913
Root MSE = 6.4125
mpg
Coefficient Std. err. t P>|t| [95% conf. interval]
rep78
1
0 (base)
2
19.125 2.267151 8.44 0.000 14.59719 23.65281
3
19.43333 1.170752 16.60 0.000 17.09518 21.77149
4
21.66667 1.511434 14.34 0.000 18.64812 24.68521
5
27.36364 1.933433 14.15 0.000 23.5023 31.22497
The ibn. factor-variable operator specifies that a categorical variable
should be treated as if it has no base, or, in other words, that all levels of
the categorical variable are to be included in the model; see
[U] 11.4.3 Factor variables.
What happens when you specify that rep78 should have no base level but
leave the constant in the model?
. regress mpg ibn.rep78
note: 5.rep78 omitted because of collinearity
Source
SS df MS
Number of obs = 69
F(4, 64) = 4.91
Model
549.415777 4 137.353944
Prob > F = 0.0016
Residual
1790.78712 64 27.9810488
R-squared = 0.2348
Adj R-squared = 0.1869
Total
2340.2029 68 34.4147485
Root MSE = 5.2897
mpg
Coefficient Std. err. t P>|t| [95% conf. interval]
rep78
1
-6.363636 4.066234 -1.56 0.123 -14.48687 1.759599
2
-8.238636 2.457918 -3.35 0.001 -13.14889 -3.32838
3
-7.930303 1.86452 -4.25 0.000 -11.65511 -4.205497
4
-5.69697 2.02441 -2.81 0.006 -9.741193 -1.652747
5
0 (omitted)
_cons
27.36364 1.594908 17.16 0.000 24.17744 30.54983
One of the levels of rep78 is omitted from the model despite your
request that there be no base level for rep78. If you have the
constant and all levels of a categorical variable in a model, something must
be dropped because of the collinearity between all the levels and the
constant.
You need to use the ibn. operator on your categorical variable and the
noconstant option on your estimation command to obtain a cell means
model.
. regress mpg ibn.rep78, noconstant
Source
SS df MS
Number of obs = 69
F(5, 64) = 227.47
Model
31824.2129 5 6364.84258
Prob > F = 0.0000
Residual
1790.78712 64 27.9810488
R-squared = 0.9467
Adj R-squared = 0.9426
Total
33615 69 487.173913
Root MSE = 5.2897
mpg
Coefficient Std. err. t P>|t| [95% conf. interval]