[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: When number of regressors greater than the number of clusters in OLS regression

From   Steven Samuels <>
Subject   Re: st: When number of regressors greater than the number of clusters in OLS regression
Date   Mon, 1 Sep 2008 19:19:48 -0400

Thanks Mark. I've been thinking that the data were not *sampled* as clusters. Since they were not, I erroneously assumed that there would not be cluster effects. I agree clustered effects should be considered. As Vince Wiggins stated in archive/2005-10/msg00594.html , "We can use the [robust] covariance matrix to test any subset of joint hypotheses that does not exceed its rank." Thus Divya can get valid standard errors for single coefficients, if she adds states as clusters, and can probably make most of the inferences she is interested in.

-xtreg- offers some intriguing possibilities, for it would distinguish between state-level and district-level predictors of the same kind. Of course statistics from neighboring districts may be spatially correlated, opening up a completely different area of analysis.

Perhaps the best advice to Divya that I can give, in addition to Mark's:

Clarify your purpose--is the study exploratory ("find a good predictive model")? Or are you testing hypotheses about certain predictors? If your analysis is exploratory, consider holding out a random set of districts or states on which to test the fit of your "best" models. If you are interested in certain predictors, than others are potential effect modifiers and confounders. You probably don't need them all. Do you have 25 predictors because you know they are all important from other studies? The more unnecessary predictors you have in one model, the more difficult it will be to tease out the truly important ones.


On Sep 1, 2008, at 6:00 PM, Schaffer, Mark E wrote:

Whether or not you need to use cluster-robust depends on whether you
think your data have a problem that cluster-robust can address, namely
(1) the error terms in your equation are correlated within states
because of unobserved heterogeneity (so the iid assumption fails), but
(2) the error terms are not correlated across states.

A good example would be whether you are looking at something that is
affected by state-level regulation, i.e., the laws regulating it vary
from state to state, but you don't have variables that control for this
*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index