»  Home »  Resources & support »  FAQs »  Keeping all levels of a variable in the model

## How do I specify a cell means model?

 Title Keeping all levels of a variable in the model Author Kenneth Higbee, StataCorp

In the following example, we use regress as our estimation command, but the same thing applies to other estimation commands that have a noconstant option.

You might try

. sysuse auto, clear
(1978 Automobile Data)

. regress mpg i.rep78, noconstant

Source        SS           df       MS      Number of obs   =        69
F(4, 65)        =    188.12
Model   30942.2129         4  7735.55322   Prob > F        =    0.0000
Residual   2672.78712        65  41.1198019   R-squared       =    0.9205
Adj R-squared   =    0.9156
Total        33615        69  487.173913   Root MSE        =    6.4125

mpg        Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

rep78
2        19.125   2.267151     8.44   0.000     14.59719    23.65281
3      19.43333   1.170752    16.60   0.000     17.09518    21.77149
4      21.66667   1.511434    14.34   0.000     18.64812    24.68521
5      27.36364   1.933433    14.15   0.000      23.5023    31.22497



and then wonder why the first level of rep78 does not appear in your regression table. If you add the baselevels option to your regression command, you will see that the first level is considered a base level and has been omitted from the model.

. regress mpg i.rep78, noconstant baselevels

Source         SS           df       MS      Number of obs   =        69
F(4, 65)        =    188.12
Model    30942.2129         4  7735.55322   Prob > F        =    0.0000
Residual    2672.78712        65  41.1198019   R-squared       =    0.9205
Adj R-squared   =    0.9156
Total         33615        69  487.173913   Root MSE        =    6.4125

mpg        Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

rep78
1             0  (base)
2        19.125   2.267151     8.44   0.000     14.59719    23.65281
3      19.43333   1.170752    16.60   0.000     17.09518    21.77149
4      21.66667   1.511434    14.34   0.000     18.64812    24.68521
5      27.36364   1.933433    14.15   0.000      23.5023    31.22497



The ibn. factor-variable operator specifies that a categorical variable should be treated as if it has no base, or, in other words, that all levels of the categorical variable are to be included in the model; see [U] 11.4.3 Factor variables.

What happens when you specify that rep78 should have no base level but leave the constant in the model?

. regress mpg ibn.rep78

note: 5.rep78 omitted because of collinearity
Source        SS           df       MS      Number of obs   =        69
F(4, 64)        =      4.91
Model   549.415777         4  137.353944   Prob > F        =    0.0016
Residual   1790.78712        64  27.9810488   R-squared       =    0.2348
Adj R-squared   =    0.1869
Total    2340.2029        68  34.4147485   Root MSE        =    5.2897

mpg       Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

rep78
1    -6.363636   4.066234    -1.56   0.123    -14.48687    1.759599
2    -8.238636   2.457918    -3.35   0.001    -13.14889    -3.32838
3    -7.930303    1.86452    -4.25   0.000    -11.65511   -4.205497
4     -5.69697    2.02441    -2.81   0.006    -9.741193   -1.652747
5            0  (omitted)

_cons    27.36364   1.594908    17.16   0.000     24.17744    30.54983



One of the levels of rep78 is omitted from the model despite your request that there be no base level for rep78. If you have the constant and all levels of a categorical variable in a model, something must be dropped because of the collinearity between all the levels and the constant.

You need to use the ibn. operator on your categorical variable and the noconstant option on your estimation command to obtain a cell means model.

. regress mpg ibn.rep78, noconstant

Source        SS           df       MS      Number of obs   =        69
F(5, 64)        =    227.47
Model   31824.2129         5  6364.84258   Prob > F        =    0.0000
Residual   1790.78712        64  27.9810488   R-squared       =    0.9467
Adj R-squared   =    0.9426
Total        33615        69  487.173913   Root MSE        =    5.2897

mpg       Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

rep78
1           21   3.740391     5.61   0.000     13.52771    28.47229
2       19.125   1.870195    10.23   0.000     15.38886    22.86114
3     19.43333   .9657648    20.12   0.000       17.504    21.36267
4     21.66667   1.246797    17.38   0.000      19.1759    24.15743
5     27.36364   1.594908    17.16   0.000     24.17744    30.54983



### Company

© Copyright 1996–2019 StataCorp LLC   •   Terms of use   •   Privacy   •   Contact us