Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: regress with vce(robust) and hascons

 From "Michael N. Mitchell" To statalist@hsphsun2.harvard.edu Subject Re: st: regress with vce(robust) and hascons Date Mon, 13 Dec 2010 13:06:09 -0800

```Dear Jeff

```
Thank you very kindly for such an extensive answer. I highly suspected that my flaw in reasoning came from overly generalizing from an ANOVA perspective. Your email is very clear and very illuminating. I understand about the switching from a sum of squares approach to a wald approach (and how the two approaches diverge in the case of robust standard errors). The way that you showed the divergence of the wald test and the computations using sums of squares makes great sense.
```
```
Unfortunately, I am still stuck on the issue of the model degrees of freedom. Without -vce(robust)-, switching from -nocons- to -hascons- changes the model degrees of freedom from 2 to 1 (see below).
```
------- SNIP ------
. *** WITHOUT -vce(robust)-
. quietly regress price ibn.foreign, nocons
. di e(df_m)
2
. quietly regress price ibn.foreign, hascons
. di e(df_m)
1
------- SNIP ------

```
But, in the presence of -vce(robust)-, the model degrees of freedom is the same is both cases, still 2 df (see below). Can you explain why the model degrees of freedom do not change from 2 to 1 when switching from -nocons- to -hascons- in the presence of -vce(robust)-. It seems that these omnibus F-tests are testing the same null hypotheses (that price is equal to 0).
```
------- SNIP ------
. *** WITH -vce(robust)-
. quietly regress price ibn.foreign, nocons vce(robust)
. di e(df_m)
2
. quietly regress price ibn.foreign, hascons vce(robust)
. di e(df_m)
2
------- SNIP ------

Many thanks,

Michael N. Mitchell
Data Management Using Stata      - http://www.stata.com/bookstore/dmus.html
A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html
Stata tidbit of the week         - http://www.MichaelNormanMitchell.com

On 2010-12-13 9.21 AM, Jeff Pitblado, StataCorp LP wrote:
```
```Michael N. Mitchell<Michael.Norman.Mitchell@gmail.com>  is using the -hascons-
option with -regress, vce(robust)- and noticed the model F statistic has a
different interpretation than the one for -regress- without the -vce(robust)-
option:

```
```I am puzzled by the behavior of Stata when I include the -vce(robust)-
option along with the -hascons- option.

Consider the example below in which I estimate a model predicting -price-
from -foreign- but do so using a cell means model by specifying ibn.foreign
and thus include the -hascons- option. I further want robust standard errors
so specify the -vce(robust)- option.
```
```
```
```. sysuse auto, clear
(1978 Automobile Data)
. regress price ibn.foreign, vce(robust) hascons

Linear regression                                      Number of obs =      74
F(  2,    72) =  165.64
Prob>  F      =  0.0000
R-squared     =  0.0024
Root MSE      =  2966.4

------------------------------------------------------------------------------
|               Robust
price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign |
0  |   6072.423   431.2084    14.08   0.000     5212.825    6932.021
1  |   6384.682   553.6754    11.53   0.000      5280.95    7488.413
|
_cons |  (omitted)
------------------------------------------------------------------------------
```
```
```
```The omnibus F test shows 2 degrees of freedom, but I only expected 1 df. The
omnibus F test appears to be testing the joint hypothesis that each of the
cell means is 0 (see below).
```
```
```
```. test 0.foreign 1.foreign

( 1)  0bn.foreign = 0
( 2)  1.foreign = 0

F(  2,    72) =  165.64
Prob>  F =    0.0000
```
```
```
```But because I specified -hascons- I expect it to test the equality of the
cell means.  This is the case when I omit the -vce(robust)-, as shown below.
```
```
```
```. regress price ibn.foreign, hascons

Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  1,    72) =    0.17
Model |  1507382.66     1  1507382.66           Prob>  F      =  0.6802
Residual |   633558013    72  8799416.85           R-squared     =  0.0024
Total |   635065396    73  8699525.97           Root MSE      =  2966.4

------------------------------------------------------------------------------
price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign |
0  |   6072.423    411.363    14.76   0.000     5252.386     6892.46
1  |   6384.682   632.4346    10.10   0.000     5123.947    7645.417
------------------------------------------------------------------------------
```
```
```
```In this case, the omnibus F test matches the test of the equality of the
cell means.
```
```
```
```. test 0.foreign = 1.foreign

( 1)  0bn.foreign - 1.foreign = 0

F(  1,    72) =    0.17
Prob>  F =    0.6802
```
```
```
```Perhaps someone can help me understand where I am askew in my thinking about
this.
```
```

There is no bug in the value of the F statistic when the -vce(robust)- option
is used with the -hascons- option.  The -vce(robust)- causes -regress- to
perform all inference based on the linearized variance estimator instead of
using the reduction in error sum of squares.

We did notice that -regress, hascons vce(robust)- reports an '(omitted)'
intercept when it shouldn't.  This will be fixed in the next executable
update.

Let's look at the model F statistic that -regress- reports.  First let's fit a
simple linear regression of 'price' on the 'foreign' factor variable:

***** BEGIN:
. regress price i.foreign

Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  1,    72) =    0.17
Model |  1507382.66     1  1507382.66           Prob>  F      =  0.6802
Residual |   633558013    72  8799416.85           R-squared     =  0.0024
Total |   635065396    73  8699525.97           Root MSE      =  2966.4

------------------------------------------------------------------------------
price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.foreign |   312.2587   754.4488     0.41   0.680    -1191.708    1816.225
_cons |   6072.423    411.363    14.76   0.000     5252.386     6892.46
------------------------------------------------------------------------------
***** END:

The model F statistic is 0.17.  This value is a function of the reduction in
the error sum of squares, and is the ratio of the model mean squares over the
error mean squares.

***** BEGIN:
. di "MS(model) = " e(mss)/e(df_m)
MS(model) = 1507382.7

. di "MS(error) = " e(rmse)^2
MS(error) = 8799416.9

. di (e(mss)/e(df_m))/e(rmse)^2
.17130484
***** END:

We can also compute this value by performing a Wald test on all the
coefficients in the model (excluding the intercept), with the Null hypothesis
that they are all equal to zero:

***** BEGIN:
. test [#1]

( 1)  0b.foreign = 0
( 2)  1.foreign = 0
Constraint 1 dropped

F(  1,    72) =    0.17
Prob>  F =    0.6802
***** END:

We see that the ANOVA style F statistic (based on the ratio of mean squres) is
computationally equivalent to the Wald F statistic.

For this particular model, the above Null hypothesis also implies that the
expected value of 'price' for Foreign cars is equal to the expected value of
'price' for Domestic.

Now let's refit our model with the -noconstant- option, we'll also use the
-bn- operator on 'foreign' to prevent Stata from omitting a base level.

***** BEGIN:
. regress price bn.foreign, nocons

Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  2,    72) =  159.91
Model |  2.8143e+09     2  1.4071e+09           Prob>  F      =  0.0000
Residual |   633558013    72  8799416.85           R-squared     =  0.8162
Total |  3.4478e+09    74  46592355.7           Root MSE      =  2966.4

------------------------------------------------------------------------------
price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign |
0  |   6072.423    411.363    14.76   0.000     5252.386     6892.46
1  |   6384.682   632.4346    10.10   0.000     5123.947    7645.417
------------------------------------------------------------------------------
***** END:

Notice that the value and degrees of freedom for the model F statistic has
changed; so have the sum of squares for the model.  Here we reproduce the mean
squares and model F stiatistic for this model:

***** BEGIN:
. di "MS(model) = " e(mss)/e(df_m)
MS(model) = 1.407e+09

. di "MS(error) = " e(rmse)^2
MS(error) = 8799416.9

. di (e(mss)/2)/e(rmse)^2
159.91266
***** END:

Here is the equivalent Wald test:

***** BEGIN:
. test [#1]

( 1)  0bn.foreign = 0
( 2)  1.foreign = 0

F(  2,    72) =  159.91
Prob>  F =    0.0000
***** END:

It is now clear that the Null hypothesis for this model F statistic is
not the same as our previous model.  Here the Null hypothesis is that 'price'
has an expected value of zero.

Now let's refit Michael's -hascons- model, without the -vce(robust)- option:

***** BEGIN:
. regress price bn.foreign, hascons

Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  1,    72) =    0.17
Model |  1507382.66     1  1507382.66           Prob>  F      =  0.6802
Residual |   633558013    72  8799416.85           R-squared     =  0.0024
Total |   635065396    73  8699525.97           Root MSE      =  2966.4

------------------------------------------------------------------------------
price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign |
0  |   6072.423    411.363    14.76   0.000     5252.386     6892.46
1  |   6384.682   632.4346    10.10   0.000     5123.947    7645.417
------------------------------------------------------------------------------
***** END:

Again, the model F statistic is derived from a reduction in the error sum of
squares.  The -hascons- option implies that there is a constant in the model,
thus the model F statistic will test against the mean-only model like we did
in our first model fit.  Here is the F statistic computed using the mean
squares:

***** BEGIN:
. di "MS(model) = " e(mss)/e(df_m)
MS(model) = 1507382.7

. di "MS(error) = " e(rmse)^2
MS(error) = 8799416.9

. di e(mss)/e(rmse)^2
.17130484
***** END:

And here is the F statistic from the Wald test:

***** BEGIN:
. test [#1]

( 1)  0bn.foreign = 0
( 2)  1.foreign = 0

F(  2,    72) =  159.91
Prob>  F =    0.0000
***** END:

Notice that, given our current model fit, the Null hypothesis for this Wald
test is that the expected value of 'price' is zero.  This is not the same as
the Null for the reported model F statistic in the -hascons- model.

Now consider refitting this model with robust/linearized variance estimates
(VCE).  Using the -vce(robust)- option causes -regress- to perform all
inference using the linearized VCE.  The analog/equivalence between the Wald F
statistic and ANOVA style F statistic breaks down in this case.  With
-vce(robust)-, -regress- is forced to use the Wald F statistic; there is no
equivalent linearized version of the F statistic formed from the ratio of mean
squares.

Here is the -hascons- model fit with linearized VCE:

***** BEGIN:
. regress price bn.foreign, hascons vce(robust)

Linear regression                                      Number of obs =      74
F(  2,    72) =  165.64
Prob>  F      =  0.0000
R-squared     =  0.0024
Root MSE      =  2966.4

------------------------------------------------------------------------------
|               Robust
price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign |
0  |   6072.423   431.2084    14.08   0.000     5212.825    6932.021
1  |   6384.682   553.6754    11.53   0.000      5280.95    7488.413
------------------------------------------------------------------------------
***** END:

Notice that -regress- doesn't even report the ANOVA table when -vce(robust)-
is speicifed.  We can still compute the means squares from -regress-'s -e()-
results:

***** BEGIN:
. di "MS(model) = " e(mss)/e(df_m)
MS(model) = 753691.33

. di "MS(error) = " e(rmse)^2
MS(error) = 8799416.9

. di e(mss)/e(rmse)^2
.17130484
***** END:

However -vce(robust)- prevents us from using them to make inferences.

Here we show that the model F statistic reported by -regress, vce(robust)-
comes from the Wald test on the model coefficients.

***** BEGIN:
. test [#1]

( 1)  0bn.foreign = 0
( 2)  1.foreign = 0

F(  2,    72) =  165.64
Prob>  F =    0.0000
***** END:

--Jeff
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```