Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Re: st: Strange -robust- results with a dummy variable

 From "Michael N. Mitchell" <[email protected]> To [email protected] Subject Re: st: Strange -robust- results with a dummy variable Date Fri, 21 Jan 2011 18:59:05 -0800

```Dear Catherine

```
My hunch is that you have a combination of very unequal Ns for the two levels of the dummy variable -d-, combined with very unequal variances for the two different groups. If you think of this, for the moment, like a t-test (or like an ANOVA), this would be described as violating the homogeneity of variance assumption. This issue is discussed on the web page
```
http://www.ats.ucla.edu/stat/stata/library/homvar.htm

```
in the context of an ANOVA framework. As noted on that page, the "robust" option provides more appropriate p values in such a case.
```
To check to see if this is the case, I would suggest trying this command

tabstat y, by(d) stat(mean sd n)

```
which will show the mean, sd, and n for y by d (we are temporarily ignoring x, for simplicity).
```
I hope this helps,

Michael N. Mitchell
Data Management Using Stata      - http://www.stata.com/bookstore/dmus.html
A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html
Stata tidbit of the week         - http://www.MichaelNormanMitchell.com

On 2011-01-21 3.34 PM, Liu Yu wrote:
```
```Dear Statalist.

I have got a weird result when I run the following two regressions. (In the
following regressions, y is a daily stock return data from 1990 to 2010, x
is the daily market return data for the same period, and d is a dummy
variable which equals 1 on Nov-10-2001 and 0 otherwise.)

The first is a simple OLS regression:

. reg y x d

----------------------------------------------------------------------------
--
y |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
x |   .0237359     .03177     0.75   0.455    -.0385487
.0860204
d |  -.0074946    .025867    -0.29   0.772    -.0582064
.0432172
_cons |   .0007387   .0003825     1.93   0.054    -.0000112
.0014886

The second equals the first regression plus the "robust" option:

. reg y x d, robust

----------------------------------------------------------------------------
--
|               Robust
y |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
x |   .0237359   .0304741     0.78   0.436    -.0360082
.0834799
d |  -.0074946    .000539   -13.90   0.000    -.0085514
-.0064378
_cons |   .0007387   .0003834     1.93   0.054    -.0000131
.0014904
----------------------------------------------------------------------------
--

I am quite surprised by the fact that the standard error of d has decreased
significantly after I use the robust option, and its t-statistics changes
from non-significant to significant. Should I trust the results from the
second regression? Is there something special that I need to pay attention
about the dummy variable and the robust option?

Thank you all.

Catherine Liu

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```