[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Joseph Coveney <jcoveney@bigplanet.com> |

To |
Statalist <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: Comparing change in rates - frustrating problem, please help |

Date |
Sun, 01 Feb 2004 21:31:04 +0900 |

Kieran McCaul posted results from a randomized parallel-group design study to illustrate the use of conditional logistic regression. The study randomized households to an intervention designed to promote banning of smoking in the home. Policy in the home was measured before and after intervention. Kieran invited Ricardo and I to respond with what we think of advocating conditional logistic regression to assess the efficacy of the intervention for before-and- after studies based upon the results posted for that study. I don't claim to speak for Ricardo, but his original question related to imbalances in the baseline rates of the outcome between the two parallel intervention groups. It appears that Kieran's study was successful in its randomization (or used stratified randomization and didn't lose too many households to dropout), because the proportions of households banning smoking at baseline were nearly identical between the intervention groups. With essentially identical rates of baseline, there would be little or no cause for concern about confounding due to it and little statistical difference in including baseline as a covariate. And, in fact, both conditional logistic regression approach and the so-called ANCOVA-like multiple logistic regression approach give essentially similar results in this balanced study. (I think the same would have obtained for Ricardo's study had the baseline rates of seatbelt use been similar between the two intervention groups.) But, let's look at the issue of which approach is more suitable when the concern is, as it was for Ricardo, to analyze an intervention effect _in the face of an imbalance in the baseline rates of an outcome_. If Kieran will indulge me one more time to use a fictional dataset to illustrate a point, let's say that Kieran's randomization method did not stratify on baseline household smoking policy, and suffered an unfortunate imbalance due to chance, for instance a 50 : 50 ratio of households banning smoking at baseline in the nonintervention group, but a 75 : 25 ratio in the intervention group. Let's say that 2 of the 50 households that previously banned smoking in the nonintervention group now permit it, a worsening of 4% (if your health policy is to ban smoking), and that only 1 of the 50 households that didn't ban smoking now do so in the nonintervention group, a meager improvement of 2%. Let's say that 4 of the 75 households that banned smoking at baseline switched and permitted smoking in the home after the intervention, and 2 of the 25 households that didn't ban smoking switched as a result of the intervention. The results of the intervention are a slightly greater 5.3% worsening (compare to 4%) in the former nonbanning household population, but a much greater 8% (compare to 2%) improvement among the formerly permissive households. Now, the effects of intervention are no great shakes, but I think that it would be safe to say that it's not *nothing*, especially if you somehow take into account the possible confounding effect of the chance unfortunate imbalance in baseline policy between treatment groups. But, by the conditional logistic regression approach, it *is* nothing--the odds ratio for both nonintervention and intervention groups is 0.5 (McNemar's test uses only the off-diagonal values and ignores the diagonal values) so the ratio of the two odds ratios is 1.0, and this is what the conditional logistic regression dutifully reports: the period term is 0.5 and the interaction term's odds ratio is 1.0 with a Z-statistic of 0.00 and a p-value of 1.00. Granted, the confidence interval encompasses a lot, but the point estimate and hypothesis test for the interaction term (which is ostensibly the effect of intervention) just don't give the same take-home message as inspection of the data. So, my conclusion differs from Kieran's on this; I don't think that conditional logistic regression is valid to test for differences between treatment effects (differences between treatment differences, which are between- subject effects) in parallel-group designs with a repeated binary outcome measure, especially in the presence of baseline differences in the outcome measure, which are ignored in the conditional logistic model. In contrast, the ANCOVA-like, baseline-as-covariate multiple regression approach does provide a separate, and I think competent, handling of baseline differences and their potential for confounding. In the fictitious example, this approach shows the pronounced effect of baseline smoking policy as expected, and it shows that the odds ratio for intervention isn't 1.0 given baseline differences between intervention groups. The saturated model (with the interaction term) also helps to put the potential for confounding into perspective. (The do-file for all of this is below for anyone interested.) It seems that at least some of the discrepancy between the two approaches reflects Simpson's paradox. This is the same underlying phenomenon that results in bias in logistic regression coefficients (and in nonlinear regression, in general) when important covariates are left out of the model. This is what Frank E. Harrell Jr.'s lecture dealt with in the URL given in my last posting. And it relates to the "noncollapsibility of odds ratios" that epidemiologists sometimes refer to. In fairness to us all (Kieran, Ricardo and me), it seems that the matter of which approach is better isn't completely settled even for *linear* models, where this incollapsibility-of-odds-ratios phenomenon and the incidental parameters problem don't apply: there is a thread ("Repeated measures and including time zero response as baseline covariate") on sci.stat.consult that was started on May 7 of last year by Frank Harrell. Professor Harrell wrote a well received book on regression modeling and is now chairman of a department of biostatistics, yet even he asks, "Has anyone come across some practical guidance for when to include the first measured response (at time zero) as a baseline covariate as opposed to the first repeated measurement in a longitudinal data analysis?" Joseph Coveney ------------------------------------------------------------------------------- clear tempfile tmp set obs 100 generate byte ban0 = _n > _N / 4 generate byte ban1 = ban0 replace ban1 = !ban1 in 50/53 replace ban1 = !ban1 in 1/2 * * Intervention group * display 4 / 75 // switching by banners display 2 / 25 // switching by permitters mcc ban1 ban0 generate byte intervention = 1 save `tmp' clear set obs 100 generate byte ban0 = _n > _N / 2 generate byte ban1 = ban0 replace ban1 = !ban1 in 50/52 * * Nonintervention group * display 2/50 // switching by banners display 1/50 // switching by permitters mcc ban1 ban0 generate byte intervention = 0 append using `tmp' erase `tmp' generate byte iac = ban0 * intervention generate int id = _n logistic ban1 ban0 intervention iac, or nolog estimates store A logistic ban1 ban0 intervention, or nolog estimates store B lrtest A B logistic ban1 ban0, or nolog lrtest A . lrtest B . quietly { reshape long ban, i(id) j(period) replace iac = period * intervention } clogit ban period intervention iac, group(id) or nolog xtgee ban period intervention iac, i(id) family(binomial) link(logit) /// corr(exchangeable) nmp eform nolog exit * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Comparing change in rates - frustrating problem, please help***From:*Ricardo Ovaldia <ovaldia@yahoo.com>

- Prev by Date:
**st: Shapley.ado** - Next by Date:
**st: RE: Simple question** - Previous by thread:
**st: Shapley.ado** - Next by thread:
**Re: st: Comparing change in rates - frustrating problem, please help** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |