Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: transformation of a continuous variable for a logistic regression model


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: transformation of a continuous variable for a logistic regression model
Date   Wed, 19 Apr 2006 14:12:18 +0100

I am not clear what you think Statalist members know
that can help you here. For example, the field 
in which you are working, what the response variable 
-dmcat- means, and what other predictors there may be are all
hidden from view, so the chance of giving opinions 
drawing on substantive expertise is zero. Otherwise
put, you appear to be assuming that the choices
here can all be made on purely statistical criteria, 
an attitude which always worries me greatly. 

What I have observed, as a kind of anthropologist of
statistical science, is that age plays very different
roles in different fields. Economists often seem 
to find that a quadratic in age does very nicely, 
whereas biostatisticians often seem to need 
more complicated representations, which seems
perfectly plausible given the complexities of
childhood, adolescence, etc. 

Either way, -fracpoly- like other programs has
no inbuilt sensor (or censor) selecting theoretically or 
scientifically sensible functional forms. So, 
I suggest that you plot the curve implied against
age and think about it as something that needs justification
or interpretation independently from the data. 

Nick 
n.j.cox@durham.ac.uk 

Suzy
 
> I am trying to transform one final continuous independent 
> variable (age) 
> in a logistic regression model. I've tried what I know that's 
> available 
> via Stata. For example, I used the fracpoly command and the best 
> transformation was a second order polynomial with powers 3 3.
> 
> Fractional polynomial model comparisons:
> ---------------------------------------------------------------
> age              df       Deviance      Gain   P(term) Powers
> ---------------------------------------------------------------
> Not in model      0       2098.129        --     --
> Linear            1       1834.224     0.000    0.000  1
> m = 1             2       1805.957    28.267    0.000  -1
> m = 2             4       1791.327    42.897    0.001  3 3
> m = 3             6       1790.526    43.699    0.670  -2 3 3
> m = 4             8       1788.431    45.793    0.351  -2 -2 3 3
> ---------------------------------------------------------------
> 
> 
> I then used fracgen to generate the new age variables - age_1 
> and age_2.
> 
> fracgen age 3 3
> -> gen double age_1 = X^3 
> -> gen double age_2 = X^3*ln(X) 
>    (where: X = (age+1)/10)
> 
> 
> 
> 
> 
> The coefficients for age_1 and age_2 from the full logistic 
> regression 
> model:
> --------------------------------------------------------------
> ----------------
>        Y var | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. 
> Interval]
> -------------+------------------------------------------------
> ----------------
>        age_1 |   1.087994   .0093302     9.83   0.000      1.06986    
> 1.106436
>        age_2 |   .9644247   .0037538    -9.31   0.000     .9570955    
> .9718101
> 
> 
> However the boxtid command rejected the null for both age_1 
> and age_2....
> 
>   age_1    |   .0100805   .0007172     14.055   Nonlin. dev. 
> 24.646  (P 
> = 0.000)
>         p1 |   .0535714   .2122906      0.252
> --------------------------------------------------------------
> ----------------
>   age_2    |  -.0021756   .0004885     -4.453   Nonlin. dev. 
> 7.894   (P 
> = 0.005)
>         p1 |   3.864227   2.133377      1.811
> 
> 
> In all other respects, the preliminary diagnostics look good...
> 
> Linktest:
> --------------------------------------------------------------
> ----------------
>        dmcat |      Coef.   Std. Err.      z    P>|z|     [95% Conf. 
> Interval]
> -------------+------------------------------------------------
> ----------------
>         _hat |   .8900851   .1153855     7.71   0.000     .6639337    
> 1.116236
>       _hatsq |  -.0319886   .0307101    -1.04   0.298    -.0921793    
> .0282022
>        _cons |  -.0450195   .1069617    -0.42   0.674    -.2546606    
> .1646215
> --------------------------------------------------------------
> ----------------
>  lroc
> 
> Logistic model for dmcat
> 
> number of observations =     3354
> area under ROC curve   =   0.8647
> 
> etc...etc...etc...
> 
> My question is should I be concerned with the results of the Boxtid 
> command? Is there something I've done incorrectly or something else I 
> can do/should do?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index