Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: regression analysis with few clusters

From   Roger Newson <[email protected]>
To   [email protected]
Subject   Re: st: regression analysis with few clusters
Date   Mon, 03 Jan 2005 19:13:54 +0000

At 18:13 03/01/2005, Krishna wrote:
Hi ,

I was hoping I could get some advice on doing
regression analysis when the data comes from few
clusters. The data I am using comes from 12 clusters
and the mean cluster size is around 30. I understand
that some standard methods for regression analysis
using clusterd data (e.g. multiple linear regression,
GEE) provide consistent estimates only when the number
of clusters is large. What would be the best approach
in my case where there are few clusters of reasonably
large size ? Would RE, RE with ML or FE models be
appropriate. Would love to hear form you and many
thanks in advance.
Krishna does not give many details. However, a fixed cluster effect model estimates a different parameter vector from a clustered model. The fixed effect model estimates the effects that would be observed, if only we could sample an enormous number of individuals from the population of individuals in the clusters that we have. A clustered model (including those used in defining GEEs and even GLLAMMs) estimates the effects that would be observed, if only we sampled an enormous number of clusters (of similar size to the clusters we have) from the population of clusters. (There is also a difference between GEE parameters and GLLAM parameters, namely that GEEs measure marginal effects in the population of clusters and GLLAMMs measure conditional effects in the population of clusters, but that is a separate issue.)

If the number of clusters is only 12, then it might possibly be argued that we cannot really have a representative sample from the population of clusters, from which to estimate dispersion parameters of the population of clusters. It might therefore be more realistic for Krishna to aim for the less ambitious goal of estimating parameters of the population of individuals in these 12 clusters, in the hope that these are not too different from other clusters in the total population of clusters. However, most of us would need to know more specific details about what Krishna is trying to measure, before giving any definitive advice.

I hope this helps.

Best wishes


Roger Newson
Lecturer in Medical Statistics
Department of Public Health Sciences
King's College London
5th Floor, Capital House
42 Weston Street
London SE1 3QD
United Kingdom

Tel: 020 7848 6648 International +44 20 7848 6648
Fax: 020 7848 6620 International +44 20 7848 6620
or 020 7848 6605 International +44 20 7848 6605
Email: [email protected]

Opinions expressed are those of the author, not the institution.

* For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index