
From  Suzy <scott_788@wowway.com> 
To  statalist@hsphsun2.harvard.edu 
Subject  Re: st: RE: transformation of a continuous variable for a logisticregression model 
Date  Wed, 19 Apr 2006 11:22:37 0400 
I am not clear what you think Statalist members know
that can help you here. For example, the field in which you are working, what the response variable dmcat means, and what other predictors there may be are all
hidden from view, so the chance of giving opinions drawing on substantive expertise is zero. Otherwise
put, you appear to be assuming that the choices
here can all be made on purely statistical criteria, an attitude which always worries me greatly.
What I have observed, as a kind of anthropologist of
statistical science, is that age plays very different
roles in different fields. Economists often seem to find that a quadratic in age does very nicely, whereas biostatisticians often seem to need more complicated representations, which seems
perfectly plausible given the complexities of
childhood, adolescence, etc.
Either way, fracpoly like other programs has
no inbuilt sensor (or censor) selecting theoretically or scientifically sensible functional forms. So, I suggest that you plot the curve implied against
age and think about it as something that needs justification
or interpretation independently from the data.
Nick n.j.cox@durham.ac.uk
Suzy
I am trying to transform one final continuous independent variable (age) in a logistic regression model. I've tried what I know that's available via Stata. For example, I used the fracpoly command and the best transformation was a second order polynomial with powers 3 3.*
Fractional polynomial model comparisons:

age df Deviance Gain P(term) Powers

Not in model 0 2098.129  
Linear 1 1834.224 0.000 0.000 1
m = 1 2 1805.957 28.267 0.000 1
m = 2 4 1791.327 42.897 0.001 3 3
m = 3 6 1790.526 43.699 0.670 2 3 3
m = 4 8 1788.431 45.793 0.351 2 2 3 3

I then used fracgen to generate the new age variables  age_1 and age_2.
fracgen age 3 3
> gen double age_1 = X^3 > gen double age_2 = X^3*ln(X) (where: X = (age+1)/10)
The coefficients for age_1 and age_2 from the full logistic regression model:


Y var  Odds Ratio Std. Err. z P>z [95% Conf. Interval]
+

age_1  1.087994 .0093302 9.83 0.000 1.06986 1.106436
age_2  .9644247 .0037538 9.31 0.000 .9570955 .9718101
However the boxtid command rejected the null for both age_1 and age_2....
age_1  .0100805 .0007172 14.055 Nonlin. dev. 24.646 (P = 0.000)
p1  .0535714 .2122906 0.252


age_2  .0021756 .0004885 4.453 Nonlin. dev. 7.894 (P = 0.005)
p1  3.864227 2.133377 1.811
In all other respects, the preliminary diagnostics look good...
Linktest:


dmcat  Coef. Std. Err. z P>z [95% Conf. Interval]
+

_hat  .8900851 .1153855 7.71 0.000 .6639337 1.116236
_hatsq  .0319886 .0307101 1.04 0.298 .0921793 .0282022
_cons  .0450195 .1069617 0.42 0.674 .2546606 .1646215


lroc
Logistic model for dmcat
number of observations = 3354
area under ROC curve = 0.8647
etc...etc...etc...
My question is should I be concerned with the results of the Boxtid command? Is there something I've done incorrectly or something else I can do/should do?
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/
© Copyright 1996–2014 StataCorp LP  Terms of use  Privacy  Contact us  What's new  Site index 