Stata: Data Analysis and Statistical Software
   >> Home >> Resources & support >> FAQs >> Keeping all levels of a variable in the model

How do I keep all levels of my categorical variable in my model?

How do I specify a cell means model?

Title   Keeping all levels of a variable in the model
Author Kenneth Higbee, StataCorp
Date August 2009

In the following example, we use regress as our estimation command, but the same thing applies to other estimation commands that have a noconstant option.

You might try

. sysuse auto, clear
(1978 Automobile Data)

. regress mpg i.rep78, noconstant

      Source |       SS       df       MS              Number of obs =      69
-------------+------------------------------           F(  4,    65) =  188.12
       Model |  30942.2129     4  7735.55322           Prob > F      =  0.0000
    Residual |  2672.78712    65  41.1198019           R-squared     =  0.9205
-------------+------------------------------           Adj R-squared =  0.9156
       Total |       33615    69  487.173913           Root MSE      =  6.4125

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rep78 |
          2  |     19.125   2.267151     8.44   0.000     14.59719    23.65281
          3  |   19.43333   1.170752    16.60   0.000     17.09518    21.77149
          4  |   21.66667   1.511434    14.34   0.000     18.64812    24.68521
          5  |   27.36364   1.933433    14.15   0.000      23.5023    31.22497
------------------------------------------------------------------------------

and then wonder why the first level of rep78 does not appear in your regression table. If you add the baselevels option to your regression command, you will see that the first level is considered a base level and has been omitted from the model.

. regress mpg i.rep78, noconstant baselevels

      Source |       SS       df       MS              Number of obs =      69
-------------+------------------------------           F(  4,    65) =  188.12
       Model |  30942.2129     4  7735.55322           Prob > F      =  0.0000
    Residual |  2672.78712    65  41.1198019           R-squared     =  0.9205
-------------+------------------------------           Adj R-squared =  0.9156
       Total |       33615    69  487.173913           Root MSE      =  6.4125

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rep78 |
          1  |  (base)   
          2  |     19.125   2.267151     8.44   0.000     14.59719    23.65281
          3  |   19.43333   1.170752    16.60   0.000     17.09518    21.77149
          4  |   21.66667   1.511434    14.34   0.000     18.64812    24.68521
          5  |   27.36364   1.933433    14.15   0.000      23.5023    31.22497
------------------------------------------------------------------------------

The ibn. factor-variable operator specifies that a categorical variable should be treated as if it has no base, or, in other words, that all levels of the categorical variable are to be included in the model; see help fvvarlist.

What happens when you specify that rep78 should have no base level but leave the constant in the model?

. regress mpg ibn.rep78

note: 5.rep78 omitted because of collinearity

      Source |       SS       df       MS              Number of obs =      69
-------------+------------------------------           F(  4,    64) =    4.91
       Model |  549.415777     4  137.353944           Prob > F      =  0.0016
    Residual |  1790.78712    64  27.9810488           R-squared     =  0.2348
-------------+------------------------------           Adj R-squared =  0.1869
       Total |   2340.2029    68  34.4147485           Root MSE      =  5.2897

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rep78 |
          1  |  -6.363636   4.066234    -1.56   0.123    -14.48687    1.759599
          2  |  -8.238636   2.457918    -3.35   0.001    -13.14889    -3.32838
          3  |  -7.930303    1.86452    -4.25   0.000    -11.65511   -4.205497
          4  |   -5.69697    2.02441    -2.81   0.006    -9.741193   -1.652747
          5  |  (omitted)
             |
       _cons |   27.36364   1.594908    17.16   0.000     24.17744    30.54983
------------------------------------------------------------------------------

One of the levels of rep78 is omitted from the model despite your request that there be no base level for rep78. If you have the constant and all levels of a categorical variable in a model, something must be dropped because of the collinearity between all the levels and the constant.

You need to use the ibn. operator on your categorical variable and the noconstant option on your estimation command to obtain a cell means model.

. regress mpg ibn.rep78, noconstant

      Source |       SS       df       MS              Number of obs =      69
-------------+------------------------------           F(  5,    64) =  227.47
       Model |  31824.2129     5  6364.84258           Prob > F      =  0.0000
    Residual |  1790.78712    64  27.9810488           R-squared     =  0.9467
-------------+------------------------------           Adj R-squared =  0.9426
       Total |       33615    69  487.173913           Root MSE      =  5.2897

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rep78 |
          1  |         21   3.740391     5.61   0.000     13.52771    28.47229
          2  |     19.125   1.870195    10.23   0.000     15.38886    22.86114
          3  |   19.43333   .9657648    20.12   0.000       17.504    21.36267
          4  |   21.66667   1.246797    17.38   0.000      19.1759    24.15743
          5  |   27.36364   1.594908    17.16   0.000     24.17744    30.54983
------------------------------------------------------------------------------
Bookmark and Share 
FAQs
What's new?
Statistics
Data management
Graphics
Programming Stata
Mata
Resources
Internet capabilities
Stata for Windows
Stata for Unix
Stata for Mac
Technical support
Like us on Facebook Follow us on Twitter Follow us on LinkedIn Google+ Watch us on YouTube
Follow us
© Copyright 1996–2013 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index   |   View mobile site