# Re: st: Dummy variable p value question

 From "Joseph Coveney" To "Statalist" Subject Re: st: Dummy variable p value question Date Mon, 1 Dec 2008 10:39:35 +0900

```Jimmy Verner wrote:

Suppose you have an interval dependent variable Y, an interval
independent variable B and a nominal variable C.  C has four
categories, C1, C2, C3 and C4.  C is coded by four dummy variables, C1
through C4, with the value 1 when "in play" and the value 0 otherwise.

One may regress Y on B and C1 through C4 by dropping the constant:

Model A:  reg Y B C1 C2 C3 C4, nocon

Alternatively, one may keep the constant but drop a category to avoid
falling into the dummy variable trap.  The constant replaces the
dropped category:

Model B:  reg Y B C1 C2 C3

If what I have said is correct, why are the p values different for C1
through C3 between the two models?  And should not the p value for C4
in Model A be the same as for the constant in Model B?

--------------------------------------------------------------------------------

First question:

Regression coefficients in the first parameterization of the model are the
means of Y for each category adjusted for B.  The regression coefficients in
the second parameterization of the model are the differences between means
of Y for categories 1 through 3 and the mean for category 4, all adjusted
for B.  (See the coeffients below.)  The null hypotheses tested by the first
parameterization are that the adjusted means are equal to zero.  Those in
the second are that the adjusted means are equal to that for category 4.

Second question:

Yes--see the results below.

Joseph Coveney

sysuse auto, clear
rename mpg Y
rename weight B
recode rep78 (5=4)
tabulate rep78, generate(C)
regress Y B C1 C2 C3 C4, noconstant
regress Y B C1 C2 C3

Results:

. regress Y B C1 C2 C3 C4, noconstant

Source |       SS       df       MS              Number of obs =
69
-------------+------------------------------           F(  5,    64) =
515.31
Model |   32800.265     5  6560.05299           Prob > F      =
0.0000
Residual |  814.735035    64  12.7302349           R-squared     =
0.9758
0.9739
Total |       33615    69  487.173913           Root MSE      =
3.5679

------------------------------------------------------------------------------
Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]
-------------+----------------------------------------------------------------
B |  -.0057832   .0005962    -9.70
0.000    -.0069744   -.0045921
C1 |   38.92806    3.12755    12.45   0.000     32.68006
45.17606
C2 |   38.52056   2.364302    16.29   0.000     33.79732
43.24379
C3 |   38.51226   2.072076    18.59   0.000     34.37281
42.65171
C4 |   39.22498    1.72017    22.80   0.000     35.78854
42.66141
------------------------------------------------------------------------------

. regress Y B C1 C2 C3

Source |       SS       df       MS              Number of obs =
69
-------------+------------------------------           F(  4,    64) =
29.96
Model |  1525.46786     4  381.366966           Prob > F      =
0.0000
Residual |  814.735035    64  12.7302349           R-squared     =
0.6519
0.6301
Total |   2340.2029    68  34.4147485           Root MSE      =
3.5679

------------------------------------------------------------------------------
Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]
-------------+----------------------------------------------------------------
B |  -.0057832   .0005962    -9.70
0.000    -.0069744   -.0045921
C1 |   -.296918   2.621481    -0.11   0.910    -5.533929
4.940093
C2 |  -.7044196   1.483296    -0.47   0.636    -3.667644
2.258805
C3 |  -.7127189   1.003684    -0.71   0.480    -2.717809
1.292371
_cons |   39.22498    1.72017    22.80   0.000     35.78854
42.66141
------------------------------------------------------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```