Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: transformation of a continuous variable for a logistic regression model


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: transformation of a continuous variable for a logistic regression model
Date   Wed, 19 Apr 2006 17:02:59 +0100

Sorry, but this to me is just a restatement of 
your previous posting, and addresses none of 
the points I raised. 

That aside, 

I don't understand how a quadratic function can 
have powers 3 3. Cubics in my experience are never 
appropriate for global fits unless there are clear 
dimensional grounds for using them, which seems unlikely 
here. 

Nick 
n.j.cox@durham.ac.uk 

Suzy
 
> Thanks for your response Nick. In a nutshell, age is not 
> linear in the 
> logit. I'm using the fracpoly command to identify the best functional 
> form for age in  the full model. The result returned from 
> Fracpoly was a 
> quadratic function with powers 3 3 (which also looks good with 
> fracplot). However, when I  further assessed the model using 
> the Boxtid 
> command, the results with the new age transformation - the 
> results were 
> not favorable (the Ho was rejected). When I transformed another 
> continuous variable in the same  full logistic model (quadratic with 
> powers 1 2 by Fracpoly),  the Boxtid results were favorable, 
> all graphs 
> looked very good, and the diagnostics were good (linktest, 
> etc...).  I'm 
> trying to understand why my results aren't consistent (Fracpoly and 
> Boxtid) with the age variable, but is with all other 
> continuous variables?
> 
> Nick Cox wrote:
> 
> >I am not clear what you think Statalist members know
> >that can help you here. For example, the field 
> >in which you are working, what the response variable 
> >-dmcat- means, and what other predictors there may be are all
> >hidden from view, so the chance of giving opinions 
> >drawing on substantive expertise is zero. Otherwise
> >put, you appear to be assuming that the choices
> >here can all be made on purely statistical criteria, 
> >an attitude which always worries me greatly. 
> >
> >What I have observed, as a kind of anthropologist of
> >statistical science, is that age plays very different
> >roles in different fields. Economists often seem 
> >to find that a quadratic in age does very nicely, 
> >whereas biostatisticians often seem to need 
> >more complicated representations, which seems
> >perfectly plausible given the complexities of
> >childhood, adolescence, etc. 
> >
> >Either way, -fracpoly- like other programs has
> >no inbuilt sensor (or censor) selecting theoretically or 
> >scientifically sensible functional forms. So, 
> >I suggest that you plot the curve implied against
> >age and think about it as something that needs justification
> >or interpretation independently from the data. 
> >
> >Nick 
> >n.j.cox@durham.ac.uk 
> >
> >Suzy
> > 
> >  
> >
> >>I am trying to transform one final continuous independent 
> >>variable (age) 
> >>in a logistic regression model. I've tried what I know that's 
> >>available 
> >>via Stata. For example, I used the fracpoly command and the best 
> >>transformation was a second order polynomial with powers 3 3.
> >>
> >>Fractional polynomial model comparisons:
> >>---------------------------------------------------------------
> >>age              df       Deviance      Gain   P(term) Powers
> >>---------------------------------------------------------------
> >>Not in model      0       2098.129        --     --
> >>Linear            1       1834.224     0.000    0.000  1
> >>m = 1             2       1805.957    28.267    0.000  -1
> >>m = 2             4       1791.327    42.897    0.001  3 3
> >>m = 3             6       1790.526    43.699    0.670  -2 3 3
> >>m = 4             8       1788.431    45.793    0.351  -2 -2 3 3
> >>---------------------------------------------------------------
> >>
> >>
> >>I then used fracgen to generate the new age variables - age_1 
> >>and age_2.
> >>
> >>fracgen age 3 3
> >>-> gen double age_1 = X^3 
> >>-> gen double age_2 = X^3*ln(X) 
> >>   (where: X = (age+1)/10)
> >>
> >>
> >>
> >>
> >>
> >>The coefficients for age_1 and age_2 from the full logistic 
> >>regression 
> >>model:
> >>--------------------------------------------------------------
> >>----------------
> >>       Y var | Odds Ratio   Std. Err.      z    P>|z|     
> [95% Conf. 
> >>Interval]
> >>-------------+------------------------------------------------
> >>----------------
> >>       age_1 |   1.087994   .0093302     9.83   0.000      
> 1.06986    
> >>1.106436
> >>       age_2 |   .9644247   .0037538    -9.31   0.000     
> .9570955    
> >>.9718101
> >>
> >>
> >>However the boxtid command rejected the null for both age_1 
> >>and age_2....
> >>
> >>  age_1    |   .0100805   .0007172     14.055   Nonlin. dev. 
> >>24.646  (P 
> >>= 0.000)
> >>        p1 |   .0535714   .2122906      0.252
> >>--------------------------------------------------------------
> >>----------------
> >>  age_2    |  -.0021756   .0004885     -4.453   Nonlin. dev. 
> >>7.894   (P 
> >>= 0.005)
> >>        p1 |   3.864227   2.133377      1.811
> >>
> >>
> >>In all other respects, the preliminary diagnostics look good...
> >>
> >>Linktest:
> >>--------------------------------------------------------------
> >>----------------
> >>       dmcat |      Coef.   Std. Err.      z    P>|z|     
> [95% Conf. 
> >>Interval]
> >>-------------+------------------------------------------------
> >>----------------
> >>        _hat |   .8900851   .1153855     7.71   0.000     
> .6639337    
> >>1.116236
> >>      _hatsq |  -.0319886   .0307101    -1.04   0.298    
> -.0921793    
> >>.0282022
> >>       _cons |  -.0450195   .1069617    -0.42   0.674    
> -.2546606    
> >>.1646215
> >>--------------------------------------------------------------
> >>----------------
> >> lroc
> >>
> >>Logistic model for dmcat
> >>
> >>number of observations =     3354
> >>area under ROC curve   =   0.8647
> >>
> >>etc...etc...etc...
> >>
> >>My question is should I be concerned with the results of the Boxtid 
> >>command? Is there something I've done incorrectly or 
> something else I 
> >>can do/should do?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index