[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Michael I. Lichter" <mlichter@buffalo.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: cluster() or svy? (analysis of cluster-randomized trials) |

Date |
Tue, 09 Sep 2008 16:32:54 -0400 |

Thanks to Austin and Jeph for responding. In reply to Jeph ...

I think there are good reasons to avoid both. You don't say what kinds of analyses you have, but see ssc describe cltest for some tools and a reference for analyzing cluster randomized outcomes using adjustments to the standard chi-2 and t-tests.Can you explain why to avoid both? Aren't they adjusting for the same phenomenon--clustering of observations? I'll describe the analyses, but that will take some background ...

This is a small trial of an intervention designed to promote guideline-based diagnosis and treatment of patients with chronic kidney disease (CKD). Four medical practices were selected and two each were randomly assigned to control and intervention. (Yes, I know that it is not recommended to do CRT with fewer than 5 clusters per arm.) Primary indicators include glomerular filtration rate (GFR) and whether or not patients with substandard GFR were diagnosed during the trial period has having CKD. We predict stable or rising GFR in intervention practices compared to falling GFR in control practices, and higher rates of physician-diagnosed CKD in intervention practices compared to control practices. The universe of patients is those with substandard GFR levels prior to the intervention.

For GFR, I was planning to regress pre/post absolute change in GFR on a dummy for control vs. not. (I'd like to include covariates like age and sex, but don't have the degrees of freedom). In partial answer to Austin's question about differences in results between cluster() and svy, and also to ask about a problem with clttest, I've included output below for this regression (1) unclustered, (2) with the cluster() option, (3) with the svy command, and (4) with clttest -- which isn't a regression but does essentially the same thing in this instance.

. reg gfr_achg rcontrol /* unclustered */

Source | SS df MS Number of obs = 159

-------------+------------------------------ F( 1, 157) = 0.30

Model | 30.4456806 1 30.4456806 Prob > F = 0.5834

Residual | 15825.7933 157 100.801231 R-squared = 0.0019

-------------+------------------------------ Adj R-squared = -0.0044

Total | 15856.239 158 100.355943 Root MSE = 10.04

------------------------------------------------------------------------------

gfr_achg | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

rcontrol | -.9589666 1.744912 -0.55 0.583 -4.405498 2.487565

_cons | -.2553191 1.464482 -0.17 0.862 -3.147948 2.63731

------------------------------------------------------------------------------

. reg gfr_achg rcontrol, cluster(rsiteid) /* clustered */

Linear regression Number of obs = 159

F( 1, 3) = 17.62

Prob > F = 0.0247

R-squared = 0.0019

Number of clusters (rsiteid) = 4 Root MSE = 10.04

------------------------------------------------------------------------------

| Robust

gfr_achg | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

rcontrol | -.9589666 .2284233 -4.20 0.025 -1.685911 -.2320217

_cons | -.2553191 .223962 -1.14 0.337 -.9680661 .4574278

------------------------------------------------------------------------------

. svy: reg gfr_achg rcontrol /* survey */

(running regress on estimation sample)

Survey: Linear regression

Number of strata = 1 Number of obs = 159

Number of PSUs = 4 Population size = 159

Design df = 3

F( 1, 3) = 17.74

Prob > F = 0.0245

R-squared = 0.0019

------------------------------------------------------------------------------

| Linearized

gfr_achg | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

rcontrol | -.9589666 .2276993 -4.21 0.024 -1.683607 -.2343258

_cons | -.2553191 .2232521 -1.14 0.336 -.965807 .4551687

------------------------------------------------------------------------------

. estat effects

----------------------------------------------------------

| Linearized

gfr_achg | Coef. Std. Err. Deff Deft

-------------+--------------------------------------------

rcontrol | -.9589666 .2276993 .012158 .110264

_cons | -.2553191 .2232521 .013712 .117098

----------------------------------------------------------

. clttest gfr_achg, by(rcontrol) cluster(rsiteid) /* clustered t-test */

t-test adjusted for clustering

gfr_achg by rcontrol, clustered by rsiteid

------------------------------------------------------------------------

Intra-cluster correlation = -0.0267

------------------------------------------------------------------------

N Clusts Mean SE 95 % CI

rcontrol=0 47 2 -0.2553 0.7924 [-10.3243, 9.8137]

rcontrol=1 112 2 -1.2143 . [ ., .]

------------------------------------------------------------------------

Combined 159 2 -0.9308 . [ ., .]

------------------------------------------------------------------------

Diff(0-1) 159 4 0.9590 . [ ., .]

Degrees freedom: 2

Ho: mean(-) = mean(diff) = 0

Ha: mean(diff) < 0 Ha: mean(diff) ~= 0 Ha: mean(diff) > 0

t = 2.1346 t = 2.1346 t = 2.1346

P < t = 0.9168 P > |t| = 0.1664 P > t = 0.0832

Suggestions on why the t-test didn't work (it didn't calculate SE) would be welcome--it worked fine for a t-test of differences in the post-GFR itself.

BTW, you might have noticed that the SEs are *smaller* in the cluster/svy model compared to the unclustered model. That's because the internal variation within clusters is much larger than the differences between them--you can see this also in the deff and deft being less than 1.0. Does this give me an excuse to treat the data as unclustered?

On the other hand, when I look at ckd2 (diagnosed with CKD) for those not diagnosed before the start of the study (ckd1 == 0), I get a substantial design effect:

. svy: logit ckd2 rcontrol if ckd1==0

(running logit on estimation sample)

Survey: Logistic regression

Number of strata = 1 Number of obs = 259

Number of PSUs = 4 Population size = 259

Design df = 3

F( 1, 3) = 10.64

Prob > F = 0.0471

------------------------------------------------------------------------------

| Linearized

ckd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

rcontrol | -2.058052 .6309325 -3.26 0.047 -4.06596 -.0501428

_cons | 1.015231 .4127449 2.46 0.091 -.2983078 2.328769

------------------------------------------------------------------------------

. estat effects

----------------------------------------------------------

| Linearized

ckd2 | Coef. Std. Err. Deff Deft

-------------+--------------------------------------------

rcontrol | -2.058052 .6309325 4.61385 2.14799

_cons | 1.015231 .4127449 3.11419 1.76471

----------------------------------------------------------

Does all that make sense?

Another preferred option is to use panel methods such as -xtmixed- with the clusters specified as panels. EvenThis is preferred because ... ?

if you don't have covariates (and in an RCT you will need to make a case for including them), these are often

preferred.

. xtmixed gfr_achg rcontrol

Mixed-effects REML regression Number of obs = 159

Wald chi2(1) = 0.30

Log restricted-likelihood = -589.18999 Prob > chi2 = 0.5826

------------------------------------------------------------------------------

gfr_achg | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

rcontrol | -.9589666 1.744912 -0.55 0.583 -4.378931 2.460998

_cons | -.2553191 1.464482 -0.17 0.862 -3.125651 2.615013

------------------------------------------------------------------------------

------------------------------------------------------------------------------

Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]

-----------------------------+------------------------------------------------

sd(Residual) | 10.03998 .5630142 8.994974 11.2064

------------------------------------------------------------------------------

More detail on your design might produce more detailed answers.

See above.

Hope this helps, Jeph

It does help. Thanks!

Michael I. Lichter wrote:Hello, friends. I have a question about the analysis of data from cluster-randomized trials (CRTs). CRTs are experiments where subjects are randomly assigned to conditions (control, treatment) based on their group membership rather than being assigned individually as is usually the case in randomized controlled trials. In my study, the clusters are medical practices, so when a medical practice is assigned to a condition, all of the eligible patients therein are also assigned to the condition. CRTs should be analyzed using methods that take account of the clustering in the study design, of course.

My question is this: For CRTs, is there any statistical reason for preferring the cluster() option on estimation commands (e.g., regress, logit) over the survey commands, or vice-versa? I've used both and the results are similar, but the survey commands estimate larger standard errors. If the answer is that they're both equally appropriate but produce different results because they use somewhat different methods of estimation, that's fine.

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: cluster() or svy? (analysis of cluster-randomized trials)***From:*Steven Samuels <sjhsamuels@earthlink.net>

**Re: st: cluster() or svy? (analysis of cluster-randomized trials)***From:*Steven Samuels <sjhsamuels@earthlink.net>

**Re: st: cluster() or svy? (analysis of cluster-randomized trials)***From:*"Austin Nichols" <austinnichols@gmail.com>

**References**:**st: cluster() or svy? (analysis of cluster-randomized trials)***From:*"Michael I. Lichter" <mlichter@buffalo.edu>

**Re: st: cluster() or svy? (analysis of cluster-randomized trials)***From:*Jeph Herrin <junk@spandrel.net>

- Prev by Date:
**Re: st: question about sample selection and hierarchical linear modelling** - Next by Date:
**st: sts graph: truncating reporting for -atrisk- option** - Previous by thread:
**Re: st: cluster() or svy? (analysis of cluster-randomized trials)** - Next by thread:
**Re: st: cluster() or svy? (analysis of cluster-randomized trials)** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |