Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Testing non-proportionality in a discrete-time survival model in which the main effect of time is treated as continuous.


From   "Kevin Daley" <kevin.daley@mail.mcgill.ca>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Testing non-proportionality in a discrete-time survival model in which the main effect of time is treated as continuous.
Date   Sun, 18 Nov 2007 15:16:14 -0500

I'd like to thank Steven and Maarten for their help. It is much appreciated.
 
 Kevin

________________________________

From: owner-statalist@hsphsun2.harvard.edu on behalf of Steven Joel Hirsch Samuels
Sent: Fri 11/16/2007 5:46 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Testing non-proportionality in a discrete-time survival model in which the main effect of time is treated as continuous.



Kevin:

Answers to questions you didn't ask:

1.  If you do conditional logistic regression, you don't need a model 
for the 'time' variable. You can still add the interaction of Wages 
with the time dummy

2. You can test the fit of your model with Stata's link test.  You 
can also test the fit of the polynonomial model by comparing the 13 
parameter model and the 4 parameter polynomial model The reason is 
that the model with a parameter for each term represents a saturated 
12-th order polynomial.  However this eight d.f.test is apt to have 
low power; you don't all those extra terms.

3. In general, I cannot recommend that you use  fourth-th order 
polynomials; They can be so curvy that they can give  inaccurate  
predictions at the extremes of time. I recommend restricted cubic 
splines, see -mkspline-, which are linear at the endpoints; have 
limited curvature in the middle; and have low effective dimension.

4. The odds ratio of the logistic model is not a good approximation 
to the ratio of conditional probabilities when those probabilities 
are high. If this is the case at some times and covariate patterns, a 
discrete hazard model would be better; see -pgmhaz-.

4. If your endpoint is 'term' and not an actual date of drop-out, you 
have have truly discrete data. If the data could have been grouped in 
other ways, in weeks, for example, then the logistic model is 
inconsistent. That is, if the model with certain parameters holds for 
one grouping, it will not hold for an arbitrary regrouping.  In 
contrast, the parameters of a theoretical grouped or discrete hazard 
model are invariant to how the intervals are formed.

5. If wages change over the course of a student's school career, then 
initial wages might not be too relevant to drop out decisions much 
later. This problem would be curable with time-dependent covariates

6. Consider a frailty model.  If there is a relatively large drop-out 
rate early, survivors could be very different.   See -pgmhaz-.

7. If the drop-out rates are very heavy in the first two terms, then 
you  consider one  model for those terms and one for the remainder.  
Arguing against that is your finding that the only interaction is 
with Wages.

-Steven

On Nov 16, 2007, at 1:32 PM, Kevin Daley wrote:

> Hello, I have a question which, I must warn any reader, is not 
> strictly to do with Stata, and is largely statistical.  That being 
> said, I would really appreciate the input of any users familiar 
> with the estimation of discrete-time event-history/duration models.
>
>
>
> I'm running a discrete-time survival analysis of time-to-drop-out 
> on a sample of adult students.  While many people following the 
> same methodological approach (I'm running a logit model on a data-
> set arranged in person-terms at risk of drop-out) will model the 
> "main effect" of time using a series of dummy variables, I have 
> opted to use a more parsimonious specification, treating time as a 
> continuous variable, and modeling the hazard through a fourth order 
> set of polynomial terms.  This lets me cut down the number of 
> parameters by 13 and successfully addresses the problem of very low 
> risk sets and/or low hazard probabilities in the later terms-so I 
> would very much like to keep this specification if possible.  The 
> problem that I have run into is this: one of my predictors (wages) 
> has a strong effect, but when hazard profiles categorized by wages 
> are compared, it becomes clear that this effect is only truly 
> pronounced in the first two terms.  After the second term wages 
> tend no!
>  t to predict much of a difference in the vertical elevation of 
> these hazard profiles. In other words, my model needs to adjust for 
> the non-proportionality of the effect of wages on the hazard of 
> drop-out.  Most of the material written on this model, however, 
> only deals with such adjustment when time has been specified using 
> the abovementioned dummies (one creates interactions between the 
> predictor and the time dummies).  I have come up with a solution 
> that seems to work quite well, but I'm not sure if it is 
> statistically legitimate.  Because the magnitude of the wage effect 
> in the first term and that in the second term are quite close and 
> the tiny amount of vertical differentiation after the first two 
> terms remains fairly constant over time, I simply created a dummy 
> variable dividing the sample into observations from term 1 or term 
> 2 and observations in any other term.  I then multiplied this dummy 
> by my continuous wage variable and entered this interaction (yet 
> not the tim!
>  e dummy) into the model already including the polynomial 
> specification
>
>  of time and the wage variable.   All variables are highly 
> significant.  Am I breaking some basic rule of statistics, however, 
> by using an interactive term derived from a different specification 
> of the variable (time) than the main effect included in the model?
>
>
>
>  Some researchers adjust for non-proportionality using an 
> interaction based on a continuous specification of time (or the log 
> of time) when its main effect was categorized, so it seems that the 
> reverse would be just as reasonable (an interaction derived from a 
> categorized effect of time while the main effect was modeled as a 
> continuous variable). Again, however, I may be quite wrong and 
> would appreciate being corrected in as great detail as possible as 
> well asreceiving any suggestions for how I might better adjust for 
> non-proportionality in this case.  Thank you very much (if you 
> managed to finish this monster email that is).
>
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

Steven  Samuels

sjhsamuels@earthlink.net
18 Cantine's Island
Saugerties, NY 12477
Phone: 845-246-0774
EFax: 208-498-7441




*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index