[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
wgould@stata.com (William Gould, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: clustering in proportional hazards models with stata/mp 10.0 |

Date |
Thu, 06 Sep 2007 09:38:24 -0500 |

Daniel Koralek <dkoralek@unc.edu> writes about using -stcox- on individual data where each individual was recruited from one of ten centers. He is concerned that which center may influence survival because "different foods eaten in different regions may influence nutrients". He considers three ways of dealing with this problem, . stcox ..., vce(cluster center) (1) . xi: stcox ... i.center (2) . stcox ..., stratify(center) (3) and, of course, he could ignore center altogether . stcox ... [center completely omitted] (0) As a matter of notation, let's assume the other covariates in the models (the ... part) are x1 and x2. My comments are as follows: Re solution (0): This solution assumes center has no effect and Daniel has already raised concerns that it does, so the solution is inappropriate. Re solution (1): This solution also assumes center has no effect; it instead conservatively handles the situation where the individual patients are overly homogeneous, which is to say, not independent draws. Actually, I didn't say that exactly right for the Cox model, but what I said implies what what I should have said, which is that selection of the failures from the risk pools at each failure time are not independent. Daniel tried solution (1) and found that the standard errors changed, but the reported coefficients did not. Exactly. Under solution (1), because center has no effect, the coefficients estimated the standard way are fine, although perhaps inefficient. The lack of independence, however, means standard errors usually will be understated and -vce(cluster center)- handles that. Re solution (2): This solution assumes that center does have a direct effect on survival, and it constrains the effect to be a multiplicative shift in the the baseline hazard function. The baseline hazard function ho(t) is a function of time, such as ho(t) | . | . . . |. . . | . . | . . | +------------------- time FYI, the baseline survival function So(t) is the integral of ho(t), negated and exponentiated. There's nothing deep there; that's just the mathematical formula for calculating one one from the other. I switchd to hazard functions, however, because the hazard function is the natural metric for the Cox model. The hazard rate for a particular individual in the data at a particular time is just ho(t)*exp(X_i*b), where X_i are the individual's covariates at time t. That's why I said solution (2) constrains each center's effect to be a multiplicative shift of ho(t). Concerning our use of dummy variables for the centers, we would like to think that we chose this particular functional form because it is truly representative of how the different foods served in the different centers influence the hazard, but the fact is that we choose this functional form because it is convenient; the effect of each center is wrapped up in just a single coefficient. This is not a bad approach. Re solution (2.5): Alright, I admit that Daniel did not include a solution (2.5), but I want to add it; it will help to understand solution (2), and is often useful in and of itself. Solution (2) was . xi: stcox ... i.center (2) Solution 2.5 is . xi: stcox ... i.center i.center*x1 (2.5) In this solution, we assume that center does not merely shift the hazard function in a multiplicative way, we assume that center modifies the effect of x1. Actually, there are a lot of solution (2.5)'s. I could have chosen x2 rather than x1, . xi: stcox ... i.center i.center*x2 or even x1 and x2, . xi: stcox ... i.center i.center*x1 i.center*x2 Anyway, in this modeling-based approach, we need to think carefully about how the different foods served in the centers effects the shifting of the baseline hazard function. Is it just a shift (solution 2), or do the different foods modify the effect x1 (solution 2.5), or something else? We also need to appreciate that we are assuming the SHAPE of the survivor function is the same across all centers and that we are just moving it up and down, multiplicatively. Re solution (3): In this solution, we let the baseline hazard be different for each center. That is, rather than assuming the baseline function is ho(t) | . | . . . |. . . | . . | . . | +------------------- time for all centers, albeit shifted, we assume that above picture might be the baseline function for center 1, and for center 2, the function could be completely different: ho(t) | . . . | . . |. . . | . . | . . . | +------------------- time and it could be different again for each of the other centers. I should emphasize that we do not actually assume the shape -- the data determine that -- we just ALLOW the shape to be different in this solution. In the previous solution, we CONSTRAINED the shape to be the same across centers, but what that single shape was was determined by the data. Anyway, this new solution seems wonderful because, what could more flexible? In this solution, however, we constrain the effects of x1 and x2 to be the same across centers. If the estimated hazard ratio for X1 is 1.5, we are saying that that each center's hazard function -- yes, they are different -- is multiplied by THE SAME 1.5 for each unit increase in X1. The multiplicative shift is the same, but the the underlying hazard functions are different. Re solution 3.5: Daniel didn't mention this solution, but what if he combined solution (3) with solution (2), which would be . xi: stcox ... i.center, stratify(center) Answer: nothing new; the result is just solution (3). -stratify(center)- already allows the baseline hazards to be different, and that includes multiplicative shifts. i.center would try to estimate a unique shift to apply to each unique baseline hazard for each center, and mechanically, that will not work because for any value of the shift, there is a corresponding baseline hazard that, when you combine the results, yields the same final result. Try this, and -stcox- will iterate forever. Nonetheless, there is a variation on the above that will work. One example is . xi: stcox ... i.center*x1, stratify(center) There is no i.center in the above -- stratify(center) handles that -- but we do allow the effect of x1 to be different across the centers. Given sufficient data, Daniel could do this if he thought center affected both the shape of the baseline hazard function and affected the effect of x1. Final comment ------------- Daniel must now choose, and he needs to base his choice on his science, judgment guided by experience, and whatever else he has that will inform him as to how the process that generates failures might reasonably work. Daniel might object that he wants to make the minimum number of assumptions necessary. In that case, I would reccomend solution (3.5), but I warn him, he may not have a sufficient amount of data for it. Given sufficient data, Daniel could start with a solution (3.5) model and then work backwards, putting constraints on it that appear reasonable, the purpose being to simplify interpretation. Usually, however, we are not so lucky as to have sufficient data to do that, and then we must think hard about what is a reasonable starting place, and go at it from there. A reasonable starting place might well be the dummy-variable shifts of solution (2) and about which Daniel was so dismissive. Identifying shifts is a lot like measuring averages. It doesn't give you the richness of detail of more complete models, but it can be a good starting point for identifying what is going on. One more warning about solution (3.5): It not a panacea. It, too, makes assumptions such as multipicative effects on hazard functions and that the functional form chosen by Daniel is correct. Given even more data, we could explore the validity of those assumptions, too. My point is that after solution 3.5, there are solutions 4, 5, 6, and on and on, each making fewer and fewer assumptions, and each requiring more and more data. -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Map with wrong colors** - Next by Date:
**Re: st: svy proportion - confidence intervals** - Previous by thread:
**st: clustering in proportional hazards models with stata/mp 10.0** - Next by thread:
**Re: st: clustering in proportional hazards models with stata/mp 10.0** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |