[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Steven Samuels <[email protected]> |

To |
[email protected] |

Subject |
Re: st: When number of regressors greater than the number of clusters in OLS regression |

Date |
Mon, 1 Sep 2008 19:19:48 -0400 |

Thanks Mark. I've been thinking that the data were not *sampled* as clusters. Since they were not, I erroneously assumed that there would not be cluster effects. I agree clustered effects should be considered. As Vince Wiggins stated in http://www.stata.com/statalist/ archive/2005-10/msg00594.html , "We can use the [robust] covariance matrix to test any subset of joint hypotheses that does not exceed its rank." Thus Divya can get valid standard errors for single coefficients, if she adds states as clusters, and can probably make most of the inferences she is interested in.

-xtreg- offers some intriguing possibilities, for it would distinguish between state-level and district-level predictors of the same kind. Of course statistics from neighboring districts may be spatially correlated, opening up a completely different area of analysis.

Perhaps the best advice to Divya that I can give, in addition to Mark's:

Clarify your purpose--is the study exploratory ("find a good predictive model")? Or are you testing hypotheses about certain predictors? If your analysis is exploratory, consider holding out a random set of districts or states on which to test the fit of your "best" models. If you are interested in certain predictors, than others are potential effect modifiers and confounders. You probably don't need them all. Do you have 25 predictors because you know they are all important from other studies? The more unnecessary predictors you have in one model, the more difficult it will be to tease out the truly important ones.

-Steve

On Sep 1, 2008, at 6:00 PM, Schaffer, Mark E wrote:

Whether or not you need to use cluster-robust depends on whether you

think your data have a problem that cluster-robust can address, namely

(1) the error terms in your equation are correlated within states

because of unobserved heterogeneity (so the iid assumption fails), but

(2) the error terms are not correlated across states.

A good example would be whether you are looking at something that is

affected by state-level regulation, i.e., the laws regulating it vary

from state to state, but you don't have variables that control for this

somehow.

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: When number of regressors greaterthan the number of clusters in OLS regression***From:*Divya Balasubramaniam <[email protected]>

**RE: st: When number of regressors greater than the number of clusters in OLS regression***From:*"Schaffer, Mark E" <[email protected]>

- Prev by Date:
**RE: st: When number of regressors greaterthan the number of clusters in OLS regression** - Next by Date:
**st: re your Statalist posting** - Previous by thread:
**RE: st: When number of regressors greater than the number of clusters in OLS regression** - Next by thread:
**RE: st: When number of regressors greaterthan the number of clusters in OLS regression** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |