It's good that you consider this makes biological
sense. My main concern was that you were focusing
on the statistical results alone. In curve fitting
there is often a tendency to over-fit and ignore
substantive or scientific considerations.
I have only two detailed comments to add:
1. Terminology. You call "quadratic" what
[R] fracpoly (and presumably Patrick Royston
and co-authors) would call "degree 2" and what the paper
cited here appears to call "second-order". This may
sound like a parade of synonyms, but my strong guess
is that it is not. With fractional polynomials
the degree is the number of powers, and _not_ the
highest power. In your case, your term "quadratic"
appears quite wrong therefore, especially for
polynomials in which none of the individual powers
is 2. I was reacting to your term and not looking
carefully at the documentation which explains this
terminology.
2. I have not tried to understand what you are doing
with -boxtid- (which is a user-written command).
But in very general terms my understanding is that
although quite different polynomials may give
similar overall fits the individual terms in
those polynomials may be not at all comparable.
The basic underlying issue is very likely that both these kinds
of polynomials are not orthogonal.
Note that attachments should not be sent to Statalist.
This is explicit in the FAQ.
Nick
n.j.cox@durham.ac.uk
Suzy
> Nick - Not to beat a dead horse, but I just thought I'd share
> this with
> you - from:<>
>
> Vincenzo Bagnardi, Antonella Zambon, Piero Quatto and
> Giovanni Corrao.
> Flexible Meta-Regression Functions for Modeling Aggregate
> Dose-Response
> Data, with an Application to Alcohol and Mortality. Am J
> Epidemiol 2004;
> 159:1077-1086.
>
> "Although it is rather simple, the family of second-order fractional
> polynomial models offers considerably flexibility. In particular, by
> choosing p1 and p2 from a predefined set P = {–2, –1, –0.5,
> 0, 0.5, 1,
> 2, 3}, a very rich set of possible functions, including some
> so-called
> U-shaped and J-shaped relations, may be accommodated. The powers are
> expressed according to the Box-Tidwell transformation (12
> <http://aje.oxfordjournals.org/cgi/content/full/159/11/1077#KW
> H142C12>),
> in which denotes if pi != 0 and log x if pi = 0. When p1 = p2
> = p, the
> model becomes log(RR½x) = ß1xp + ß2(xp log x)."
>
> I thought that a second order polynomial = "degree of 2" (M=2) =
> quadratic as shown in my output from fracpoly below (M=2). I had also
> e-mailed the fracplot to show the quadratic curve, but for
> some reason,
> it was deleted via transport. In any case, the age variable
> transformations (age_1 and age_2) from the fracgen command were
> calculated using the the formulas above - ß1age3 + ß2(age3 log age).
>
> Thus, I still respectfully do not understand why the fracpoly
> and boxtid
> results are not consistent with this variable. As far as a
> theoretical
> justification of the functional form of age and the response
> variable -
> it does make sense for these data.
>
> Nick Cox wrote:
>
> >Sorry, but this to me is just a restatement of
> >your previous posting, and addresses none of
> >the points I raised.
> >
> >That aside,
> >
> >I don't understand how a quadratic function can
> >have powers 3 3. Cubics in my experience are never
> >appropriate for global fits unless there are clear
> >dimensional grounds for using them, which seems unlikely
> >here.
> >
> >Nick
> >n.j.cox@durham.ac.uk
> >
> >Suzy
> >
> >
> >
> >>Thanks for your response Nick. In a nutshell, age is not
> >>linear in the
> >>logit. I'm using the fracpoly command to identify the best
> functional
> >>form for age in the full model. The result returned from
> >>Fracpoly was a
> >>quadratic function with powers 3 3 (which also looks good with
> >>fracplot). However, when I further assessed the model using
> >>the Boxtid
> >>command, the results with the new age transformation - the
> >>results were
> >>not favorable (the Ho was rejected). When I transformed another
> >>continuous variable in the same full logistic model
> (quadratic with
> >>powers 1 2 by Fracpoly), the Boxtid results were favorable,
> >>all graphs
> >>looked very good, and the diagnostics were good (linktest,
> >>etc...). I'm
> >>trying to understand why my results aren't consistent (Fracpoly and
> >>Boxtid) with the age variable, but is with all other
> >>continuous variables?
> >>
> >>Nick Cox wrote:
> >>
> >>
> >>
> >>>I am not clear what you think Statalist members know
> >>>that can help you here. For example, the field
> >>>in which you are working, what the response variable
> >>>-dmcat- means, and what other predictors there may be are all
> >>>hidden from view, so the chance of giving opinions
> >>>drawing on substantive expertise is zero. Otherwise
> >>>put, you appear to be assuming that the choices
> >>>here can all be made on purely statistical criteria,
> >>>an attitude which always worries me greatly.
> >>>
> >>>What I have observed, as a kind of anthropologist of
> >>>statistical science, is that age plays very different
> >>>roles in different fields. Economists often seem
> >>>to find that a quadratic in age does very nicely,
> >>>whereas biostatisticians often seem to need
> >>>more complicated representations, which seems
> >>>perfectly plausible given the complexities of
> >>>childhood, adolescence, etc.
> >>>
> >>>Either way, -fracpoly- like other programs has
> >>>no inbuilt sensor (or censor) selecting theoretically or
> >>>scientifically sensible functional forms. So,
> >>>I suggest that you plot the curve implied against
> >>>age and think about it as something that needs justification
> >>>or interpretation independently from the data.
> >>>
> >>>Nick
> >>>n.j.cox@durham.ac.uk
> >>>
> >>>Suzy
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>>I am trying to transform one final continuous independent
> >>>>variable (age)
> >>>>in a logistic regression model. I've tried what I know that's
> >>>>available
> >>>>via Stata. For example, I used the fracpoly command and the best
> >>>>transformation was a second order polynomial with powers 3 3.
> >>>>
> >>>>Fractional polynomial model comparisons:
> >>>>---------------------------------------------------------------
> >>>>age df Deviance Gain P(term) Powers
> >>>>---------------------------------------------------------------
> >>>>Not in model 0 2098.129 -- --
> >>>>Linear 1 1834.224 0.000 0.000 1
> >>>>m = 1 2 1805.957 28.267 0.000 -1
> >>>>m = 2 4 1791.327 42.897 0.001 3 3
> >>>>m = 3 6 1790.526 43.699 0.670 -2 3 3
> >>>>m = 4 8 1788.431 45.793 0.351 -2 -2 3 3
> >>>>---------------------------------------------------------------
> >>>>
> >>>>
> >>>>I then used fracgen to generate the new age variables - age_1
> >>>>and age_2.
> >>>>
> >>>>fracgen age 3 3
> >>>>-> gen double age_1 = X^3
> >>>>-> gen double age_2 = X^3*ln(X)
> >>>> (where: X = (age+1)/10)
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>The coefficients for age_1 and age_2 from the full logistic
> >>>>regression
> >>>>model:
> >>>>--------------------------------------------------------------
> >>>>----------------
> >>>> Y var | Odds Ratio Std. Err. z P>|z|
> >>>>
> >>>>
> >>[95% Conf.
> >>
> >>
> >>>>Interval]
> >>>>-------------+------------------------------------------------
> >>>>----------------
> >>>> age_1 | 1.087994 .0093302 9.83 0.000
> >>>>
> >>>>
> >>1.06986
> >>
> >>
> >>>>1.106436
> >>>> age_2 | .9644247 .0037538 -9.31 0.000
> >>>>
> >>>>
> >>.9570955
> >>
> >>
> >>>>.9718101
> >>>>
> >>>>
> >>>>However the boxtid command rejected the null for both age_1
> >>>>and age_2....
> >>>>
> >>>> age_1 | .0100805 .0007172 14.055 Nonlin. dev.
> >>>>24.646 (P
> >>>>= 0.000)
> >>>> p1 | .0535714 .2122906 0.252
> >>>>--------------------------------------------------------------
> >>>>----------------
> >>>> age_2 | -.0021756 .0004885 -4.453 Nonlin. dev.
> >>>>7.894 (P
> >>>>= 0.005)
> >>>> p1 | 3.864227 2.133377 1.811
> >>>>
> >>>>
> >>>>In all other respects, the preliminary diagnostics look good...
> >>>>
> >>>>Linktest:
> >>>>--------------------------------------------------------------
> >>>>----------------
> >>>> dmcat | Coef. Std. Err. z P>|z|
> >>>>
> >>>>
> >>[95% Conf.
> >>
> >>
> >>>>Interval]
> >>>>-------------+------------------------------------------------
> >>>>----------------
> >>>> _hat | .8900851 .1153855 7.71 0.000
> >>>>
> >>>>
> >>.6639337
> >>
> >>
> >>>>1.116236
> >>>> _hatsq | -.0319886 .0307101 -1.04 0.298
> >>>>
> >>>>
> >>-.0921793
> >>
> >>
> >>>>.0282022
> >>>> _cons | -.0450195 .1069617 -0.42 0.674
> >>>>
> >>>>
> >>-.2546606
> >>
> >>
> >>>>.1646215
> >>>>--------------------------------------------------------------
> >>>>----------------
> >>>>lroc
> >>>>
> >>>>Logistic model for dmcat
> >>>>
> >>>>number of observations = 3354
> >>>>area under ROC curve = 0.8647
> >>>>
> >>>>etc...etc...etc...
> >>>>
> >>>>My question is should I be concerned with the results of
> the Boxtid
> >>>>command? Is there something I've done incorrectly or
> >>>>
> >>>>
> >>something else I
> >>
> >>
> >>>>can do/should do?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/