Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: question about cluster() option in regression


From   Steven Samuels <sjhsamuels@earthlink.net>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: question about cluster() option in regression
Date   Thu, 12 Feb 2009 11:19:00 -0500

--

Thus there is nothing "improper" about the cluster option. The difference in estimated coefficients is due to a difference in how observations are weighted. In -reg- with or without a cluster option, all observations get the same weight, unless you specify otherwise. Multi-level models, on the other hand, estimate variances of the random effects and use the information to give observations different weights.

Contrary to your impression, this phenomenon is not "rare" in statistical software. In SAS, for example, you will find the same difference between estimates from: 1) the GENMOD and SURVEYREG programs, which have equivalent cluster options, and 2) the multi- level MIXED program.

The main difference between the two approaches, cluster or multilevel, is that the cluster() option provides model-free standard errors. The multi-level programs require a correct model for the variance structure, for example that standard deviations are constant at each level. If the model for the variance structure is correct, estimates from multi-level programs will be more efficient, and standard errors will be more precisely estimated. In Stata and SAS, you can combine the advantages of both: fit a multi-level model, but get cluster-robust standard errors. In -gllamm-, you can go further, fit a multi-level model but also account for clusters that are not part of the model.

-Steve

On Feb 12, 2009, at 10:11 AM, ronggui wrote:

Hi all,

When the data is clustered rather then independent, there are many
possible ways to handle it such as dummy variable, panel data models
(fixed or random effects), GEE and multilevel model (e.g.
http://www.gseis.ucla.edu/courses/ed230bc1/notes3/cluster.html). Of
course, cluster() option of reg is one quick choice too. Yet, I have
noticed that the result (especially coef) from reg differs that from
multilevel model. Furthermore, this kind of adjustment is rare in
other statistical software. All of this makes me cast doubt on the
practices of regression with cluster() option in the Stata way. I
wonder if there is paper on this, especially if  this adjustment is
proper or improper? Besides, can we predict the change of level of
significance by using cluster()? Thanks in advance.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index