[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: question about cluster() option in regression

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	Re: st: question about cluster() option in regression
Date	Thu, 12 Feb 2009 11:19:00 -0500

--

Thus there is nothing "improper" about the cluster option. Thedifference in estimated coefficients is due to a difference in howobservations are weighted. In -reg- with or without a cluster option,all observations get the same weight, unless you specify otherwise.Multi-level models, on the other hand, estimate variances of therandom effects and use the information to give observations differentweights.

Contrary to your impression, this phenomenon is not "rare" instatistical software. In SAS, for example, you will find the samedifference between estimates from: 1) the GENMOD and SURVEYREGprograms, which have equivalent cluster options, and 2) the multi-level MIXED program.

The main difference between the two approaches, cluster ormultilevel, is that the cluster() option provides model-free standarderrors. The multi-level programs require a correct model for thevariance structure, for example that standard deviations are constantat each level. If the model for the variance structure is correct,estimates from multi-level programs will be more efficient, andstandard errors will be more precisely estimated. In Stata and SAS,you can combine the advantages of both: fit a multi-level model, butget cluster-robust standard errors. In -gllamm-, you can go further,fit a multi-level model but also account for clusters that are notpart of the model.


-Steve

On Feb 12, 2009, at 10:11 AM, ronggui wrote:

Hi all,

When the data is clustered rather then independent, there are many
possible ways to handle it such as dummy variable, panel data models
(fixed or random effects), GEE and multilevel model (e.g.
http://www.gseis.ucla.edu/courses/ed230bc1/notes3/cluster.html). Of
course, cluster() option of reg is one quick choice too. Yet, I have
noticed that the result (especially coef) from reg differs that from
multilevel model. Furthermore, this kind of adjustment is rare in
other statistical software. All of this makes me cast doubt on the
practices of regression with cluster() option in the Stata way. I
wonder if there is paper on this, especially if  this adjustment is
proper or improper? Besides, can we predict the change of level of
significance by using cluster()? Thanks in advance.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: question about cluster() option in regression
  - From: ronggui <[email protected]>

Prev by Date: st: AW: question about cluster() option in regression
Next by Date: st: RE: Re: How does -kwallis2- compute adjusted p-values for significance?
Previous by thread: st: AW: question about cluster() option in regression
Index(es):
- Date
- Thread