Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Comparing change in rates - frustrating problem: questionable results


From   Ricardo Ovaldia <ovaldia@yahoo.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Comparing change in rates - frustrating problem: questionable results
Date   Tue, 10 Feb 2004 10:41:53 -0800 (PST)

Continuing on a previous discussion, I applied both
Joseph's and Kieran's method to a a large set of the
seat belt intervention data and obtained some
questionable results. Here is a summary table:

--------------------------------------------------
     case |     N    pre           post
----------+---------------------------------------
        0 |   140     60(42.9%)    72(51.4%)
        1 |   139     53(38.1%)    89(64.0%)
--------------------------------------------------

In the control group (case=0) we so an increase from
42.9% to 51.4% (diff=8.5%), whereas in the
intervention group (case=1), we so an increase from
38.1% to 64.0% (diff=25.9%). So the increase appears
to be greater in the intervention group than in the
control group. i.e. the intervention seem to work.

Here are the results of Joseph's MANOVA like approach:
. xi:logistic    post pre case i.case*pre
i.case            _Icase_0-1          (naturally
coded; _Icase_0 omitted)
i.case*pre        _IcasXpre_#         (coded as above)

note: _Icase_1 dropped due to collinearity
note: pre dropped due to collinearity

Logistic regression                              
Number of obs   =        279
                                                  LR
chi2(3)      =      49.88
                                                  Prob
> chi2     =     0.0000
Log likelihood = -165.12038                      
Pseudo R2       =     0.1312

------------------------------------------------------------------------------
        post | Odds Ratio   Std. Err.      z    P>|z| 
   [95% Conf. Interval]
-------------+----------------------------------------------------------------
         pre |   6.824176   2.644293     4.96   0.000 
   3.193147    14.58416
        case |   2.175824   .7000504     2.42   0.016 
   1.158132      4.0878
 _IcasXpre_1 |   .7868083   .4614124    -0.41   0.683 
   .2492838    2.483384
------------------------------------------------------------------------------


and Kieran's conditional logistic method yields:



. xi:clogit   period  i.seatbelt*i.case,group(
participantid) or nolog
i.seatbelt        _Iseatbelt_0-1      (naturally
coded; _Iseatbelt_0 omitted)
i.case            _Icase_0-1          (naturally
coded; _Icase_0 omitted)
i.sea~t*i.case    _IseaXcas_#_#       (coded as above)
note: _Icase_1 omitted due to no within-group
variance.

Conditional (fixed-effects) logistic regression  
Number of obs   =        558
                                                  LR
chi2(2)      =      31.09
                                                  Prob
> chi2     =     0.0000
Log likelihood = -177.84119                      
Pseudo R2       =     0.0804

------------------------------------------------------------------------------
      period | Odds Ratio   Std. Err.      z    P>|z| 
   [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iseatbelt_1 |   1.857143   .6156369     1.87   0.062 
   .9697834    3.556443
_IseaXcas_~1 |   2.961538   1.503159     2.14   0.032 
   1.095169    8.008541
------------------------------------------------------------------------------


With Joseph's method the p-value for the interaction
is 0.683, indicating no treatment effect.
But with Kieran's method the p-value is 0.032
indicating a significant treatment effect. Looking at
the actual data I believe the results from the
conditional logistic more than the "MANOVA" like
approach, given that the baselines are similar.

What am I missing?

Thank you,
Ricardo.



 



--- Ricardo Ovaldia <ovaldia@yahoo.com> wrote:
> Thank you Joseph and Kieran. Obviously this was not
> the easy question I though it was. I have spent
> several days contemplating the answers and playing
> around with my data. Although I find Kieran's
> conditional logistic approach appealing, I
> understand
> and agree with Joseph's concerns and objections.
> Faced
> with the need to analyze these data and the eventual
> submission for publication I fear that reviewers may
> disagree with which ever method I select. The issue
> becomes more complicated when one considers the
> effect
> of additional covariates such as sex on the
> intervention. 
> 
> Regardless of all this, I appreciate tremendously
> Joseph and Kieran comments and time thinking about
> this problem.
> 
> Ricardo.
> 
> 
> --- Joseph Coveney <jcoveney@bigplanet.com> wrote:
> > 
> > Kieran McCaul posted results from a randomized
> > parallel-group design study to 
> > illustrate the use of conditional logistic
> > regression.  The study randomized 
> > households to an intervention designed to promote
> > banning of smoking in the 
> > home.  Policy in the home was measured before and
> > after intervention.  Kieran 
> > invited Ricardo and I to respond with what we
> think
> > of advocating conditional 
> > logistic regression to assess the efficacy of the
> > intervention for before-and-
> > after studies based upon the results posted for
> that
> > study.
> > 
> > I don't claim to speak for Ricardo, but his
> original
> > question related to 
> > imbalances in the baseline rates of the outcome
> > between the two parallel 
> > intervention groups.  It appears that Kieran's
> study
> > was successful in its 
> > randomization (or used stratified randomization
> and
> > didn't lose too many 
> > households to dropout), because the proportions of
> > households banning smoking 
> > at baseline were nearly identical between the
> > intervention groups.  With 
> > essentially identical rates of baseline, there
> would
> > be little or no cause for 
> > concern about confounding due to it and little
> > statistical difference in 
> > including baseline as a covariate.  And, in fact,
> > both conditional logistic 
> > regression approach and the so-called ANCOVA-like
> > multiple logistic regression 
> > approach give essentially similar results in this
> > balanced study.  (I think the 
> > same would have obtained for Ricardo's study had
> the
> > baseline rates of seatbelt 
> > use been similar between the two intervention
> > groups.)
> > 
> > But, let's look at the issue of which approach is
> > more suitable when the 
> > concern is, as it was for Ricardo, to analyze an
> > intervention effect _in the 
> > face of an imbalance in the baseline rates of an
> > outcome_.
> > 
> > If Kieran will indulge me one more time to use a
> > fictional dataset to 
> > illustrate a point, let's say that Kieran's
> > randomization method did not 
> > stratify on baseline household smoking policy, and
> > suffered an unfortunate 
> > imbalance due to chance, for instance a 50 : 50
> > ratio of households banning 
> > smoking at baseline in the nonintervention group,
> > but a 75 : 25 ratio in the 
> > intervention group.  Let's say that 2 of the 50
> > households that previously 
> > banned smoking in the nonintervention group now
> > permit it, a worsening of 4% 
> > (if your health policy is to ban smoking), and
> that
> > only 1 of the 50 households 
> > that didn't ban smoking now do so in the
> > nonintervention group, a meager 
> > improvement of 2%.  Let's say that 4 of the 75
> > households that banned smoking 
> > at baseline switched and permitted smoking in the
> > home after the intervention, 
> > and 2 of the 25 households that didn't ban smoking
> > switched as a result of the 
> > intervention.  The results of the intervention are
> a
> > slightly greater 5.3% 
> > worsening (compare to 4%) in the former nonbanning
> > household population, but a 
> > much greater 8% (compare to 2%) improvement among
> > the formerly permissive 
> > households.  
> > 
> > Now, the effects of intervention are no great
> > shakes, but I think that it would 
> > be safe to say that it's not *nothing*, especially
> > if you somehow take into 
> > account the possible confounding effect of the
> > chance unfortunate imbalance in 
> > baseline policy between treatment groups.
> > 
> > But, by the conditional logistic regression
> > approach, it *is* nothing--the odds 
> > ratio for both nonintervention and intervention
> > groups is 0.5 (McNemar's test 
> > uses only the off-diagonal values and ignores the
> > diagonal values) so the ratio 
> > of the two odds ratios is 1.0, and this is what
> the
> > conditional logistic 
> > regression dutifully reports:  the period term is
> > 0.5 and the interaction 
> > term's odds ratio is 1.0 with a Z-statistic of
> 0.00
> > and a p-value of 1.00.  
> > Granted, the confidence interval encompasses a
> lot,
> > but the point estimate and 
> > hypothesis test for the interaction term (which is
> > ostensibly the effect of 
> > intervention) just don't give the same take-home
> > message as inspection of the 
> > data.  So, my conclusion differs from Kieran's on
> > this; I don't think that 
> > conditional logistic regression is valid to test
> for
> > differences between 
> > treatment effects (differences between treatment
> > differences, which are between-
> > subject effects) in parallel-group designs with a
> > repeated binary outcome 
> > measure, especially in the presence of baseline
> > differences in the outcome 
> > measure, which are ignored in the conditional
> > logistic model.
> > 
> > In contrast, the ANCOVA-like,
> baseline-as-covariate
> > multiple regression 
> > approach does provide a separate, and I think
> > competent, handling of baseline 
> > differences and their potential for confounding. 
> In
> > the fictitious example, 
> > this approach shows the pronounced effect of
> > baseline smoking policy as 
> > expected, and it shows that the odds ratio for
> > intervention isn't 1.0 given 
> > baseline differences between intervention groups. 
> > The saturated model (with 
> > the interaction term) also helps to put the
> > potential for confounding into 
> > perspective.  (The do-file for all of this is
> below
> > for anyone interested.)
> > 
> > It seems that at least some of the discrepancy
> > between the two approaches 
> > reflects Simpson's paradox.  This is the same
> > underlying phenomenon that 
> > results in bias in logistic regression
> coefficients
> > (and in nonlinear 
> > regression, in general) when important covariates
> > are left out of the model.  
> > This is what Frank E. Harrell Jr.'s lecture dealt
> > with in the URL given in my 
> > last posting.  And it relates to the
> > "noncollapsibility of odds ratios" that 
> > epidemiologists sometimes refer to.
> > 
> > In fairness to us all (Kieran, Ricardo and me), it
> > seems that the matter of 
> > which approach is better isn't completely settled
> > even for *linear* models, 
> > where this incollapsibility-of-odds-ratios
> > phenomenon and the incidental 
> > parameters problem don't apply:  there is a thread
> > ("Repeated measures and 
> 
=== message truncated ===


=====
Ricardo Ovaldia, MS
Statistician 
Oklahoma City, OK

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index