Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Strange -robust- results with a dummy variable
From 
 
"Michael N. Mitchell" <[email protected]> 
To 
 
[email protected] 
Subject 
 
Re: st: Strange -robust- results with a dummy variable 
Date 
 
Fri, 21 Jan 2011 18:59:05 -0800 
Dear Catherine
  My hunch is that you have a combination of very unequal Ns for the two levels of the 
dummy variable -d-, combined with very unequal variances for the two different groups. If 
you think of this, for the moment, like a t-test (or like an ANOVA), this would be 
described as violating the homogeneity of variance assumption. This issue is discussed on 
the web page
 http://www.ats.ucla.edu/stat/stata/library/homvar.htm
in the context of an ANOVA framework. As noted on that page, the "robust" option provides 
more appropriate p values in such a case.
  To check to see if this is the case, I would suggest trying this command
tabstat y, by(d) stat(mean sd n)
  which will show the mean, sd, and n for y by d (we are temporarily ignoring x, for 
simplicity).
I hope this helps,
Michael N. Mitchell
Data Management Using Stata      - http://www.stata.com/bookstore/dmus.html
A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html
Stata tidbit of the week         - http://www.MichaelNormanMitchell.com
On 2011-01-21 3.34 PM, Liu Yu wrote:
Dear Statalist.
I have got a weird result when I run the following two regressions. (In the
following regressions, y is a daily stock return data from 1990 to 2010, x
is the daily market return data for the same period, and d is a dummy
variable which equals 1 on Nov-10-2001 and 0 otherwise.)
The first is a simple OLS regression:
. reg y x d
----------------------------------------------------------------------------
--
            y |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
            x |   .0237359     .03177     0.75   0.455    -.0385487
.0860204
            d |  -.0074946    .025867    -0.29   0.772    -.0582064
.0432172
        _cons |   .0007387   .0003825     1.93   0.054    -.0000112
.0014886
The second equals the first regression plus the "robust" option:
. reg y x d, robust
----------------------------------------------------------------------------
--
              |               Robust
            y |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
            x |   .0237359   .0304741     0.78   0.436    -.0360082
.0834799
            d |  -.0074946    .000539   -13.90   0.000    -.0085514
-.0064378
        _cons |   .0007387   .0003834     1.93   0.054    -.0000131
.0014904
----------------------------------------------------------------------------
--
I am quite surprised by the fact that the standard error of d has decreased
significantly after I use the robust option, and its t-statistics changes
from non-significant to significant. Should I trust the results from the
second regression? Is there something special that I need to pay attention
about the dummy variable and the robust option?
Thank you all.
Catherine Liu
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/