Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Strange -robust- results with a dummy variable
From
"Michael N. Mitchell" <[email protected]>
To
[email protected]
Subject
Re: st: Strange -robust- results with a dummy variable
Date
Fri, 21 Jan 2011 18:59:05 -0800
Dear Catherine
My hunch is that you have a combination of very unequal Ns for the two levels of the
dummy variable -d-, combined with very unequal variances for the two different groups. If
you think of this, for the moment, like a t-test (or like an ANOVA), this would be
described as violating the homogeneity of variance assumption. This issue is discussed on
the web page
http://www.ats.ucla.edu/stat/stata/library/homvar.htm
in the context of an ANOVA framework. As noted on that page, the "robust" option provides
more appropriate p values in such a case.
To check to see if this is the case, I would suggest trying this command
tabstat y, by(d) stat(mean sd n)
which will show the mean, sd, and n for y by d (we are temporarily ignoring x, for
simplicity).
I hope this helps,
Michael N. Mitchell
Data Management Using Stata - http://www.stata.com/bookstore/dmus.html
A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html
Stata tidbit of the week - http://www.MichaelNormanMitchell.com
On 2011-01-21 3.34 PM, Liu Yu wrote:
Dear Statalist.
I have got a weird result when I run the following two regressions. (In the
following regressions, y is a daily stock return data from 1990 to 2010, x
is the daily market return data for the same period, and d is a dummy
variable which equals 1 on Nov-10-2001 and 0 otherwise.)
The first is a simple OLS regression:
. reg y x d
----------------------------------------------------------------------------
--
y | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
x | .0237359 .03177 0.75 0.455 -.0385487
.0860204
d | -.0074946 .025867 -0.29 0.772 -.0582064
.0432172
_cons | .0007387 .0003825 1.93 0.054 -.0000112
.0014886
The second equals the first regression plus the "robust" option:
. reg y x d, robust
----------------------------------------------------------------------------
--
| Robust
y | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
x | .0237359 .0304741 0.78 0.436 -.0360082
.0834799
d | -.0074946 .000539 -13.90 0.000 -.0085514
-.0064378
_cons | .0007387 .0003834 1.93 0.054 -.0000131
.0014904
----------------------------------------------------------------------------
--
I am quite surprised by the fact that the standard error of d has decreased
significantly after I use the robust option, and its t-statistics changes
from non-significant to significant. Should I trust the results from the
second regression? Is there something special that I need to pay attention
about the dummy variable and the robust option?
Thank you all.
Catherine Liu
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/