# st: transformation of a continuous variable for a logistic regressionmodel

 From Suzy To statalist@hsphsun2.harvard.edu Subject st: transformation of a continuous variable for a logistic regressionmodel Date Tue, 18 Apr 2006 19:47:19 -0400

I am trying to transform one final continuous independent variable (age) in a logistic regression model. I've tried what I know that's available via Stata. For example, I used the fracpoly command and the best transformation was a second order polynomial with powers 3 3.

Fractional polynomial model comparisons:
---------------------------------------------------------------
age df Deviance Gain P(term) Powers
---------------------------------------------------------------
Not in model 0 2098.129 -- --
Linear 1 1834.224 0.000 0.000 1
m = 1 2 1805.957 28.267 0.000 -1
m = 2 4 1791.327 42.897 0.001 3 3
m = 3 6 1790.526 43.699 0.670 -2 3 3
m = 4 8 1788.431 45.793 0.351 -2 -2 3 3
---------------------------------------------------------------

I then used fracgen to generate the new age variables - age_1 and age_2.

fracgen age 3 3
-> gen double age_1 = X^3 -> gen double age_2 = X^3*ln(X) (where: X = (age+1)/10)

The coefficients for age_1 and age_2 from the full logistic regression model:
------------------------------------------------------------------------------
Y var | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age_1 | 1.087994 .0093302 9.83 0.000 1.06986 1.106436
age_2 | .9644247 .0037538 -9.31 0.000 .9570955 .9718101

However the boxtid command rejected the null for both age_1 and age_2....

age_1 | .0100805 .0007172 14.055 Nonlin. dev. 24.646 (P = 0.000)
p1 | .0535714 .2122906 0.252
------------------------------------------------------------------------------
age_2 | -.0021756 .0004885 -4.453 Nonlin. dev. 7.894 (P = 0.005)
p1 | 3.864227 2.133377 1.811

In all other respects, the preliminary diagnostics look good...

------------------------------------------------------------------------------
dmcat | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_hat | .8900851 .1153855 7.71 0.000 .6639337 1.116236
_hatsq | -.0319886 .0307101 -1.04 0.298 -.0921793 .0282022
_cons | -.0450195 .1069617 -0.42 0.674 -.2546606 .1646215
------------------------------------------------------------------------------
lroc

Logistic model for dmcat

number of observations = 3354
area under ROC curve = 0.8647

etc...etc...etc...

My question is should I be concerned with the results of the Boxtid command? Is there something I've done incorrectly or something else I can do/should do?

Thanks for any help or insight on this.

Suzy

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/