[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Kit Baum <kitbaum@mac.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: reg w cluster std errs |

Date |
Mon, 17 Oct 2005 21:17:44 -0400 |

David said

I've run into the situation of not getting an overall F stat when running a

regression with clustered standard errors. Previous postings and Stata help

say that:

1) the number of estimated coefficients must be lower than the number of

clusters

2) each variable cannot have a non-zero value for just one observation

My final sample fulfils the above two.

I think the problem lies in having some of the dummy variables

corresponding to only one cluster. My data is clustered by 'case

investigation' (with several observations for each case) and each case falls

into a particular industry. I have industry dummy variables.

Some of these industry dummies have only one corresponding 'case

investigation'. When I get rid of these 'one case' industry dummies, I get

an F stat.

In a cluster covariance matrix estimator, you are essentially running a regression using one observation per cluster, which is why condition (1) above is important. In a standard regression, a dummy with a single 1 will essentially remove that data point from the analysis--that is, if you run the same regression without the dummy and without the observation to which it pertains, you will get the same results for the other parameters (and N-k, Root MSE, etc. will be unchanged). But the ANOVA F is messed up because it miscounts the slopes, considering that dummy to be a meaningful regressor rather than a nuisance. Try

g dum=(_n==10)

reg hours kidslt6 kidsge6 dum

reg hours kidslt6 kidsge6 if _n!=10

The clickable help for that missing F-stat when you cluster with a dummy that is only nonzero for one cluster says

Is there a regressor that is nonzero for only one observation?

The VCE you have just estimated is not of sufficient rank to perform the model test. This

can happen if there is a variable in your model that is nonzero for only a single observation

in the estimation sample. In that case the derivative of the sum-of-squares or likelihood

function with respect to that variable's parameter is zero for all observations. That

implies that the outer-product-of-gradients (OPG) variance matrix is singular. Since the OPG

variance matrix is used in computing the robust variance matrix, the latter is therefore

singular as well.

I think that StataCorp might want to expand this to include "is there a regressor which is nonzero for only one cluster when you are using the cluster option?"

Kit Baum, Boston College Economics

http://ideas.repec.org/e/pba1.html

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**RE: st: Residual and influence diagnostics for conditional logistic regression** - Next by Date:
**st: RE: Graphing RRs and their 95% CIs** - Previous by thread:
**RE: st: Residual and influence diagnostics for conditional logistic regression** - Next by thread:
**st: xtreg with lag** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |