Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Mantel-Haenszel vs. clustered logistic - please help

From   Constantine Daskalakis <[email protected]>
To   [email protected]
Subject   Re: st: Mantel-Haenszel vs. clustered logistic - please help
Date   Fri, 23 May 2003 16:08:28 -0400

> Dear all,
> I am analyzing data from the 306 women at 4 outpatient
> clinics in Oklahoma. Each woman was asked if they
> performed monthly breast exams and other additional
> data (covariates) such as race was collected. We would
> like to characterize women according to these
> covariates. I am concerned about the these women are
> from different clinics and would like to take this
> into account.
> My first though was to performed a logistic regression
> clustering on clinic.

> Summary:
> The raw OR:           1.97  (1.10,3.54) p= 0.0138
> Mantel-Haenszel OR:   2.94  (1.44,5.98) p= 0.0018
> Cluster logistic OR:  1.97  (0.77,5.06) p= 0.158
> Regards,
> Ricardo
There are two issues here.

1. Do you need to control for clinic in your model? This is necessary if the clinics have varying rates of SBE (ie, clinic is a confounder).

The logistic regressions do not control for this. The MH does. You can get comparable results by including appropriate dummy variables for clinic in your logistic models.

Looking at your results, it seems that you do need to control for clinic.

2. Do you need to control for the clustering (ie, correlation of observations) within clinics?

Usual logistic regression assumes that all observations are independent. This may not be for observations that come from the same clinic (ie, the outcomes for two women of the same clinic may be more similar that the outcomes for two women from different clinics). The cluster/robust options account for this lack of independence.

I don't agree with Mark that "clustering on clinic is not a good idea." It does not matter how many clusters you have, as long as you account for the within-cluster correlation. The robust variance certainly does not treat the data as 4 observations. This is how I'd explain it.

Without clustering (ie, independence for all observations), you have 306 observations.

With perfect clustering (ie, perfect correlation within each clinic, but independence across clinics), you have 4 observations.

The cluster/robust option will essentially correctly fall somewhere in between, depending on the actual degree of correlation in your data. This is as it should be, no? You don't really have info from 306 independent observations, but less (depending on how correlated they are).

Looking at your results, it seems that there's a non-trivial within-clinic correlation. -logit- uses the "independence" working correlation to correct for clustering. You can use -xtlogit- (with exchangeable correlation) to get more efficient estimates, ie, tighter standard errors.

The documents accompanying this transmission may contain confidential health or business information. This information is intended for the use of the individual or entity named above. If you have received this information in error, please notify the sender immediately and arrange for the return or destruction of these documents.

Constantine Daskalakis, ScD
Assistant Professor,
Biostatistics Section, Thomas Jefferson University,
125 S. 9th St. #402, Philadelphia, PA 19107
Tel: 215-955-5695
Fax: 215-503-3804
Email: [email protected]

* For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index