How do I keep all levels of my categorical variable in my model?
How do I specify a cell means model?
|
Title
|
|
Keeping all levels of a variable in the model
|
|
Author
|
Kenneth Higbee, StataCorp
|
|
Date
|
August 2009
|
In the following example, we use
regress as
our estimation command, but the same thing applies to other estimation
commands that have a noconstant option.
You might try
. sysuse auto, clear
(1978 Automobile Data)
. regress mpg i.rep78, noconstant
Source | SS df MS Number of obs = 69
-------------+------------------------------ F( 4, 65) = 188.12
Model | 30942.2129 4 7735.55322 Prob > F = 0.0000
Residual | 2672.78712 65 41.1198019 R-squared = 0.9205
-------------+------------------------------ Adj R-squared = 0.9156
Total | 33615 69 487.173913 Root MSE = 6.4125
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rep78 |
2 | 19.125 2.267151 8.44 0.000 14.59719 23.65281
3 | 19.43333 1.170752 16.60 0.000 17.09518 21.77149
4 | 21.66667 1.511434 14.34 0.000 18.64812 24.68521
5 | 27.36364 1.933433 14.15 0.000 23.5023 31.22497
------------------------------------------------------------------------------
and then wonder why the first level of rep78 does not appear in your
regression table. If you add the baselevels option to your regression
command, you will see that the first level is considered a base level and has
been omitted from the model.
. regress mpg i.rep78, noconstant baselevels
Source | SS df MS Number of obs = 69
-------------+------------------------------ F( 4, 65) = 188.12
Model | 30942.2129 4 7735.55322 Prob > F = 0.0000
Residual | 2672.78712 65 41.1198019 R-squared = 0.9205
-------------+------------------------------ Adj R-squared = 0.9156
Total | 33615 69 487.173913 Root MSE = 6.4125
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rep78 |
1 | (base)
2 | 19.125 2.267151 8.44 0.000 14.59719 23.65281
3 | 19.43333 1.170752 16.60 0.000 17.09518 21.77149
4 | 21.66667 1.511434 14.34 0.000 18.64812 24.68521
5 | 27.36364 1.933433 14.15 0.000 23.5023 31.22497
------------------------------------------------------------------------------
The ibn. factor-variable operator specifies that a categorical variable
should be treated as if it has no base, or, in other words, that all levels of
the categorical variable are to be included in the model; see
help fvvarlist.
What happens when you specify that rep78 should have no base level but
leave the constant in the model?
. regress mpg ibn.rep78
note: 5.rep78 omitted because of collinearity
Source | SS df MS Number of obs = 69
-------------+------------------------------ F( 4, 64) = 4.91
Model | 549.415777 4 137.353944 Prob > F = 0.0016
Residual | 1790.78712 64 27.9810488 R-squared = 0.2348
-------------+------------------------------ Adj R-squared = 0.1869
Total | 2340.2029 68 34.4147485 Root MSE = 5.2897
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rep78 |
1 | -6.363636 4.066234 -1.56 0.123 -14.48687 1.759599
2 | -8.238636 2.457918 -3.35 0.001 -13.14889 -3.32838
3 | -7.930303 1.86452 -4.25 0.000 -11.65511 -4.205497
4 | -5.69697 2.02441 -2.81 0.006 -9.741193 -1.652747
5 | (omitted)
|
_cons | 27.36364 1.594908 17.16 0.000 24.17744 30.54983
------------------------------------------------------------------------------
One of the levels of rep78 is omitted from the model despite your
request that there be no base level for rep78. If you have the
constant and all levels of a categorical variable in a model, something must
be dropped because of the collinearity between all the levels and the
constant.
You need to use the ibn. operator on your categorical variable and the
noconstant option on your estimation command to obtain a cell means
model.
. regress mpg ibn.rep78, noconstant
Source | SS df MS Number of obs = 69
-------------+------------------------------ F( 5, 64) = 227.47
Model | 31824.2129 5 6364.84258 Prob > F = 0.0000
Residual | 1790.78712 64 27.9810488 R-squared = 0.9467
-------------+------------------------------ Adj R-squared = 0.9426
Total | 33615 69 487.173913 Root MSE = 5.2897
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rep78 |
1 | 21 3.740391 5.61 0.000 13.52771 28.47229
2 | 19.125 1.870195 10.23 0.000 15.38886 22.86114
3 | 19.43333 .9657648 20.12 0.000 17.504 21.36267
4 | 21.66667 1.246797 17.38 0.000 19.1759 24.15743
5 | 27.36364 1.594908 17.16 0.000 24.17744 30.54983
------------------------------------------------------------------------------
|