# Re: st: clustering in proportional hazards models with stata/mp 10.0 - conditional logistic

 From Ricardo Ovaldia To statalist@hsphsun2.harvard.edu Subject Re: st: clustering in proportional hazards models with stata/mp 10.0 - conditional logistic Date Wed, 24 Oct 2007 05:47:12 -0700 (PDT)

```Thank you Dr. Gould for a thorough and clear
explanation.

I have a similar problem related to conditional
logistic regression. I have data from a multi-center
(7 clinics) study. I analyzed the data using
conditional logistic grouping on clinic. I was asked
to defend my method, because previous analyses on
these data were performed using indicator variables or
simply using a robust variance estimator.

I am planning on using the explanation from Dr. Gould
post, however, the argument that I would use for
conditional logistic is the same as that presented for
the indicator variables (dummies) . So I am missing
something, what is the difference? By the way, the
results I obtained using conditional logistic and
dummies are very similar.

Thank you,
Ricardo

--- "William Gould, StataCorp LP" <wgould@stata.com>
wrote:

> Daniel Koralek <dkoralek@unc.edu> writes about using
> -stcox- on individual
> data where each individual was recruited from one of
> ten centers.  He is
> concerned that which center may influence survival
> because "different foods
> eaten in different regions may influence nutrients".
>
> He considers three ways of dealing with this
> problem,
>
>        . stcox ..., vce(cluster center)
>   (1)
>
>        . xi:  stcox ... i.center
>   (2)
>
>        . stcox ..., stratify(center)
>   (3)
>
> and, of course, he could ignore center altogether
>
>        . stcox ... [center completely omitted]
>   (0)
>
> As a matter of notation, let's assume the other
> covariates in the
> models (the ... part) are x1 and x2.
>
> My comments are as follows:
>
> Re solution (0):
>
>      This solution assumes center has no effect and
>      raised concerns that it does, so the solution
> is inappropriate.
>
> Re solution (1):
>
>      This solution also assumes center has no
>      conservatively handles the situation where the
> individual patients
>      are overly homogeneous, which is to say, not
> independent draws.
>      Actually, I didn't say that exactly right for
> the Cox model, but
>      what I said implies what what I should have
> said, which is that
>      selection of the failures from the risk pools
> at each failure time
>      are not independent.
>
>      Daniel tried solution (1) and found that the
> standard errors changed,
>      but the reported coefficients did not.
> Exactly.  Under solution (1),
>      because center has no effect, the coefficients
> estimated the standard
>      way are fine, although perhaps inefficient.
> The lack of independence,
>      however, means standard errors usually will be
> understated and
>      -vce(cluster center)- handles that.
>
> Re solution (2):
>
>      This solution assumes that center does have a
> direct effect on
>      survival, and it constrains the effect to be a
> multiplicative
>      shift in the the baseline hazard function.  The
> baseline hazard
>      function ho(t) is a function of time, such as
>
>             ho(t)
>               |             .
>               | .         .   .
>               |. .       .
>               |   .    .
>               |     . .
>               |
>               +-------------------  time
>
>       FYI, the baseline survival function So(t) is
> the integral of
>       ho(t), negated and exponentiated.  There's
> nothing deep there;
>       that's just the mathematical formula for
> calculating one one
>       from the other.  I switchd to hazard
> functions, however,
>       because the hazard function is the natural
> metric for the Cox model.
>       The hazard rate for a particular individual in
> the data at a particular
>       time is just ho(t)*exp(X_i*b), where X_i are
> the individual's covariates
>       at time t.  That's why I said solution (2)
> constrains each center's
>       effect to be a multiplicative shift of ho(t).
>
>       Concerning our use of dummy variables for the
> centers,
>       we would like to think that we chose this
> particular functional form
>       because it is truly representative of how the
> different
>       foods served in the different centers
> influence the hazard, but
>       the fact is that we choose this functional
> form because it is
>       convenient; the effect of each center is
> wrapped up in just a
>       single coefficient.
>
>       This is not a bad approach.
>
> Re solution (2.5):
>
>       Alright, I admit that Daniel did not include a
> solution (2.5), but
>       I want to add it; it will help to understand
> solution (2), and
>       is often useful in and of itself.
>
>       Solution (2) was
>
>        . xi:  stcox ... i.center
>   (2)
>
>       Solution 2.5 is
>
>        . xi:  stcox ... i.center i.center*x1
>   (2.5)
>
>       In this solution, we assume that center does
> not merely shift
>       the hazard function in a multiplicative way,
> we assume that
>       center modifies the effect of x1.
>
>       Actually, there are a lot of solution (2.5)'s.
>  I could have chosen
>       x2 rather than x1,
>
>        . xi:  stcox ... i.center i.center*x2
>
>       or even x1 and x2,
>
>        . xi:  stcox ... i.center i.center*x1
> i.center*x2
>
>      Anyway, in this modeling-based approach, we
> need to think carefully
>      about how the different foods served in the
> centers effects the shifting
>      of the baseline hazard function.  Is it just a
> shift (solution 2),
>      or do the different foods modify the effect x1
> (solution 2.5), or
>      something else?
>
>      We also need to appreciate that we are assuming
> the SHAPE of the
>      survivor function is the same across all
> centers and that we are
>      just moving it up and down, multiplicatively.
>
>
> Re solution (3):
>
>      In this solution, we let the baseline hazard be
> different for each
>      center.  That is, rather than assuming the
> baseline function is
>
>             ho(t)
>               |             .
>               | .         .   .
>               |. .       .
>               |   .    .
>               |     . .
>               |
>               +-------------------  time
>
>       for all centers, albeit shifted, we assume
> that above picture might
>       be the baseline function for center 1, and for
> center 2, the function
>       could be completely different:
>
>             ho(t)
>               |    . . .
>               |   .     .
>               |. .       .
>               | .         .
>               |            . . .
>               |
>               +-------------------  time
>
=== message truncated ===

Ricardo Ovaldia, MS
Statistician
Oklahoma City, OK

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```