Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: areg, robust, cluster


From   Christopher F Baum <baum@bc.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: areg, robust, cluster
Date   Wed, 21 Jul 2004 22:33:24 -0400

Garrett wrote
I'm estimating a regression on how changing political party platforms affect
vote shares. I included country-specific dummy variables, and I'm also using
robust clustered standard errors (clustering on countries) as there's likely to
be (negative) correlation between parties in vote share.
I first estimated the model without clustering, first with areg, and then with
regress and a set of dummy variables. As expected, the results were identical.
However, when I add the cluster option it looks like Stata is making different
corrections to the degrees of freedom in the t-test for statistical
significance in these models, as well as doing some other things differently

He compared
Regression with robust standard errors Number of obs = 158
F( 5, 7) = .
Prob > F = .
R-squared = 0.1477
Number of clusters (ctrynum) = 8 Root MSE = 4.5572

-----------------------------------------------------------------------
| Robust
vgain | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------
vgainone | -.1540838 .0951223 -1.62 0.149 -.3790123 .0708447

with

Regression with robust standard errors Number of obs = 158
F( 5, 144) = 12.84
Prob > F = 0.0000
R-squared = 0.1477
Adj R-squared = 0.0707
Root MSE = 4.5572
(standard errors adjusted for clustering on ctrynum)
-----------------------------------------------------------------------
| Robust
vgain | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------
vgainone | -.1540838 .0951223 -1.62 0.107 -.3421002 .0339326

and wondered why the tail probs of the t-values were different.

. di 2*ttail(7,1.62)
.14926394

. di 2*ttail(144,1.62)
.10742019

In the first case, the t's are calculated with 7 d.f. (number of clusters, 8, less one).
In the second case, they are calculated with 144 d.f. (which appears to be 7 + (8-1) = 14 less than the 158 obs.)

I imagine the issue is that areg with absorb and cluster set to the same variable recognizes what is going on; regress, with the country dummies, does not figure out that this is the same as 'absorbing' the country indicator.

Kit

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index