.- help for ^fp^ (STB-21: sg26, STB-22: sg26.1, STB-25: sg26.3) .- Fractional polynomial modeling ------------------------------ ^fp^ yvar [nvar] xvar [^in^ range] [^if^ exp] [^weight^] [^,^ options ] ^fp^ [^,^ ^s^ummary ^c^omparison ^e^stimates ] The major_options (most used options), in alphabetic order, are: ^ba^se^(^basevars^)^ ^cm^d^(^regression_cmd^)^ ^com^parison ^df(^#^)^ ^est^imates ^fix^powers^(^fixlist^)^ ^log^ ^po^wers^(^powlist|^none)^ regression_cmd_options The minor_options (less used options), in alphabetic order, are: ^add^powers^(^addlist^)^ ^cont^inuous^(^contvars^)^ ^dev^thr^(^#^)^ ^ex^px^(^#|^sd^|^+^|^-)^ ^fast^ ^gp^lot^(^#^)^ ^na^me^(^newxvar^)^ ^ori^gin^(^#^)^ ^norep^eat ^sav^ing^(^graph_file [^, replace^]^)^ ^zer^o Description ----------- The first form of the ^fp^ command performs the analysis. The second form is used to (re)display results already obtained by using the first form. ^fp^ fits fractional polynomial (FP) models in xvar to yvar. A wide variety of model types is supported, including normal-errors regression (the default), logistic, Cox, Poisson, and so on (see the ^cmd()^ option for futher details). nvar is required if `blocked' logit (^blogit^) or probit (^bprobit^) models are to be fitted; then yvar is the number of successes out of nvar trials. The degrees (m) of FP which ^fp^ fits are 1 (m = 1) or 2 (m = 2); to fit models with m > 2, use either the ^fixpowers()^ option (see below) or the ^fpx^ command (see ^help fpx^). The fractional powers of xvar are supplied by the program (but may be altered by using the ^powlist()^ and ^addpowers()^ options). `Base model' variables are defined as variables which are always included in the model. By default, the base model consists only of the regression constant; other variables may be added by using the ^basevars()^ option. The models actually fitted, therefore, consist of the base variables and the terms representing FPs in xvar. The latter variables are calculated automatically. From now on, all models will be assumed to include the base variable(s). ^fp^ also fits linear, quadratic and cubic polynomials, and the Box-Tidwell model (`BoxTid'), namely Y = b0 + b1 * X + b2 * X * ln(X). If `BoxTid' fits significantly better than a straight line, there is evidence of curvature in the relation between yvar and xvar. ^fp^ reports the deviances for all models (see the STB-21 article for definitions and formulae). If the ^comparison^ option is used, ^fp^ compares the fits of various pairs of models using P-values, which are approximate (typically conservative) when FP models are compared with other models. The most important comparisons are between m = 1 and linear, and between m = 2 and m = 1. If deviance of a cubic polynomial (for which m = 3) is much lower than that of the best m = 2 model, you will need a model with m > 2 in order to fit the data adequately. ^fp^ also calculates the regression analysis of the best-fitting model. This model may be `replayed' by repeating regression_cmd, or by using the ^estimates^ option, which gives slightly more information. You may also use ^predict^, ^test^, etc after ^fp^; the results will depend on the regression_cmd you used. If xvar has any negative or zero values, ^fp^ subtracts the minimum of xvar from xvar and then adds the rounding (or counting) interval. This interval is determined as the smallest positive difference between the ordered values of xvar. After transformation of xvar its minimum is positive, so fractional polynomial models can be used. The amount added to xvar is stored in ^$S_23^. See the ^zero^ option (under Minor options below) for an alternative approach. Major options ------------- ^base()^ defines the variables in the base model. Default is none (typically, the base model is then just the constant term). ^cmd()^ determines the type of regression to be used. The default, equivalent to specifying ^cmd(regress)^, is ordinary regression. Other regression commands supported include ^anova^, ^logit^, ^probit^, ^blogit^, ^bprobit^, ^mlogit^, ^poisson^, ^cox^, ^qreg^, ^glmr^. Other commands will work if they store the log-likelihood for the model in ^_result(2)^. Please read the note on ^blogit^ and ^bprobit^ commands which follows the Minor options section. ^comparison^ performs significance tests between various pairs of models (see Description above). ^df(^#^)^ specifies (a) the degrees of freedom of the highest-degree FP model and (b) that comparison is to be made between models with df = # and those with df = #-1. For example, ^df(3)^ compares m = 2 models in which one power is always 1 with the best m = 1 model. ^df(1)^ is equivalent to ^powers(1)^. If ^df()^ is not specified (the default), the best m = 2 model is found and its fit is compared with that of the best m = 1 model. ^estimates^ outputs the results of the regression command used to fit the best FP model (see (^cmd()^) and Description above). ^fixpowers()^ includes fractional power(s) of xvar corresponding to fixlist in every FP model fitted. The powers must be in increasing order. The degree of the FP is increased by the number of values in fixlist. For example, with ^fixpowers(0,1)^, all models will include terms in ln(xvar) and xvar; the FP degree will then be 3 for `m = 1' models and 4 for `m = 2' models. The presence of ^fixpowers()^ in the models is indicated by a plus (+) symbol in the results. ^log^ displays deviance differences from the base model and (for normal- errors regression) residual standard deviations for each FP model fitted. ^powers()^ is the set of fractional polynomial powers to be used. The default set is powlist = {-2, -1, -0.5, 0, 0.5, 1, 2, 3} (0 giving log). Specifying ^powers(none)^ prevents ^fp^ from searching for the best model; it uses only the powers defined by ^fixpowers()^. regression_cmd_options are options appropriate to the regression ^cmd()^ in use. For example, for ^cmd(cox)^, regression_cmd_options must include the name of the censoring variable (deadvar) by specifying ^dead(^deadvar^)^ in the usual way. Options for the second form of ^fp^ --------------------------------- ^comparison^, ^estimates^ -- same as for first form, see Major options. ^summary^ repeats the table of deviances etc. which is the minimum output from the first form. Examples (Note: Description of the Minor options follows Examples.) -------- Using Stata's ^auto.dta^ dataset, we want to model a car's economy (^mpg^) in terms of its engine size (^displ^). . ^use auto.dta^ . ^graph mpg displ^ The scatter plot suggests curvature in the relation. We try fitting FP models: . ^fp mpg displ^ The results show that the best (m = 1) power for ^displ^ is -2. The deviance is 400.59. The best m = 2 model has powers (-2,3) and a deviance of 397.97, which is only 2.62 lower and not statistically significant. (Note that a cubic polynomial, with a deviance of 399.93, is required to give a fit as good as that of the best m = 1 model.) Preliminary stepwise regression analysis indicated that ^foreign^ and ^weight^ are also significant predictors of ^mpg^, so we include them in the model as base variables: . ^fp mpg displ, base(foreign weight)^ The same m = 1 power (-2) is obtained. The deviance is now 376.53, a signi- ficant reduction compared with not including ^foreign^ and ^weight^. The m = 2 model is no better than the m = 1 model. The deviance of the m = 1 model is lower than that of a cubic polynomial (377.59), which has two more terms. The scaled origin of ^displ^ (its minimum divided by its maximum) is 0.186. We generally find that a value between about 0.05 and 0.2, typically 0.1, gives good results in FP modelling. To check the effect on the fit of the m = 1 model, we reset the scaled origin to 0.1 using the ^origin()^ option, and refit the m = 1 model. We then display the value of the origin (zeta), in the same units as ^displ^, that ^fp^ has used when transforming ^displ^: . ^fp mpg displ, base(foreign weight) one origin(0.1)^ . ^display $S_23^ The deviance has reduced by 0.73 from 376.53 to 375.80, indicating a better (but not statistically significantly better) fit. The value of zeta is 40.55. To find simultaneously the best FP functions for ^mpg^ for several covariates, such as ^displ^, ^weight^, ^gratio^, etc., is beyond the scope of ^fp^. A program to do so is under development. Minor options ------------- ^addpowers()^ adds powers to the list defined by ^powers()^. This saves you having to type in the whole of the default powlist when you simply wish to include a few extra powers in it. ^continuous()^ applies only to ^cmd(anova)^. contvars is the set of variables in basevars that are continuous (the default is categorical). ^devthr(^#^)^ sets a deviance threshold. Confidence intervals for the fitted values for any model whose deviance is below # will be evaluated. The `envelope' of these confidence intervals will be stored in new variables called ^_etal^ (lower limit) and ^_etah^ (upper limit); if variables with these names already exist, they will be quietly dropped. The intervals will usually be wider than those obtained from the best-fitting FP model, sometimes considerably so. The value of # will typically be taken as 3.84 above the deviance for the best-fitting model; the latter has to be determined first, so the ^devthr()^ option will only be used once this has been done. ^ex^px^(^#|^sd^|^+^|^-)^ transforms xvar to exp(k * xvar). This option is useful if yvar is expected to level off at high or low values of xvar. Also, if xvar has negative values, the transformation ensures positive xvar. If # is specified, then k = #. If ^sd^ is specified then k = -sd(xvar), where sd(xvar) is the standard deviation of xvar. ^+^ and ^-^ are only valid in conjunction with the ^origin()^ option. If ^+^ (^-^) is specified, k is the correct positive (negative) number to provide the required origin after the exp transformation. Use of ^expx(sd)^ has proved quite successful when no clear value of # suggests itself (i.e. usually). See also the ^origin()^ option. ^fast^ enables more efficient searching for the best combination of powers (m = 2 models only). For a given first power p1, if the deviance increases for the current second power (p2) compared with the preceding p2, the current p1 is abandoned and the search continues with the next p1 and a new set of p2's. This option only works reliably if, for each p1 in ^powers()^, there is no model among those defined by the values of p2 in ^powers()^ whose deviance is locally, but not globally, a minimum. This is a strong condition, though frequently true. Use of ^fast^ can greatly speed up ^fp^. ^gplot(^#^)^ produces a plot of the gain against the first power (p1). The gain is the deviance for a straight line model minus that for the current model; the bigger the gain, the better the model fits. The gains for models with m = 2, with powers (p1, p2), are shown using the value of p2 as the plotting symbol. The value of # determines what is plotted: # = 0 shows the gains for all models, whereas # = m shows degree-m models only. ^name()^ is an alternative prefix for the variable(s) created by ^fp^ which contain fractional power(s) of xvar. The default name is ^X^. For example, a run with m = 2 will add variables ^X_1^ and ^X_2^ to the dataset; specifying ^name(fpx)^ will name these new variables ^fpx_1^ and ^fpx_2^ instead. ^origin(^#^)^ causes xvar to be transformed so that its maximum is 1 and its minimum is #. Specifically, the transformation is defined as follows: zeta = (xmin - # * xmax) / (1 - #) replace xvar with (xvar - zeta) / (xmax - zeta), where xmin and xmax are the minimum and maximum respectively of xvar. This transformation is useful if xvar contains negative and/or zero values, or if its range is too narrow for effective FP modelling. # must be between 0 and 1 exclusive; a good general-purpose recommendation is # = 0.05. See also the ^expx()^ option. The calculated value of zeta is saved in ^$S_23^. ^norepeat^ prevents the use of "repeated power" models such as p1 = p2 = 1. This option may be used in conjunction with ^powers()^ to force ^fp^ to fit models with conventional polynomial terms only. ^saving()^ is used only with ^gplot()^ and allows the graph to be saved. ^zero^ treats negative and zero values of xvar as zero in the FP regression analyses. (The default option, ^nozero^, transforms xvar, if necessary, to remove non-positive values). The ^zero^ option allows you to fit an FP model only to the positive values of xvar, redefining the non-positive values as zero. An example is assessment of the effect of cigarette smoking on the risk of a disease in an epidemiological study. Non-smokers are often qualitatively different from smokers, so the effect of smoking (regarded as a continuous variable) may be discontinuous at zero. The risk may be modelled as a constant for the non-smokers and an FP function of the number smoked for the smokers: . ^fp outcome num_cigs, zero^ Omission of the ^zero^ option here would cause ^num_cigs^ to be transformed before analysis by the addition of a suitable constant, probably 1. Technical note: blogit and bprobit regression commands ------------------------------------------------------ ^cmd(blogit^|^bprobit)^ are implemented in a way that requires ^fp^ (a) to double the length of the data, and (b) to create two new variables, called ^_y_eq_1^ and ^_pop^. ^fp^ then uses the standard ^logit^ or ^probit^ commands with response variable ^_y_eq_1^ and with frequency weight ^[fw=_pop]^ to fit the FP model. When you have finished the analysis, you wil probably wish to restore the dataset to its original length by removing the `extra' cases. You can achieve this by typing . ^drop if _y_eq_1^ The FP replay commands for ^blogit^ or ^bprobit^ models will then be inoperative until the next time these models are fit. Saved Results ------------- ^fp^ saves in the ^$S_^# macros as follows. ^S_1^ to ^S_7^ are deviances: ^S_1^ base model ^S_2^ base model + linear model ^S_3^ " " + quadratic polynomial ^S_4^ " " + cubic polynomial ^S_5^ " " + Box-Tidwell model ^S_6^ " " + best m = 1 model ^S_7^ " " + best m = 2 model ^S_8^ to ^S_12^ are powers and residual standard deviations: ^S_8^ best power for m = 1 model ^S_9^ best power1 for m = 2 model ^S_10^ best power2 for m = 2 model ^S_11^ residual SD for best m = 1 model (^regress^ and ^anova^ only) ^S_12^ residual SD for best m = 2 model (^regress^ and ^anova^ only) ^S_13^ to ^S_22^ are P-values for model comparisons: ^S_13^ linear + base vs. base ^S_14^ quadratic poly + base vs. linear + base ^S_15^ cubic poly + base vs. quadratic poly + base ^S_16^ Box-Tidwell model + base vs. linear + base ^S_17^ best m = 1 + base vs. base ^S_18^ best m = 1 + base vs. linear + base ^S_19^ best m = 2 + base vs. base ^S_20^ best m = 2 + base vs. linear + base ^S_21^ best m = 2 + base vs. quadratic poly + base ^S_22^ best m = 2 + base vs. best m = 1 + base ^S_23^ to ^S_25^ are miscellaneous: ^S_23^ zeta (see ^origin()^ option and Examples) ^S_24^ value (if any) of xvar at which FP fn is max or min (m=2 only) ^S_25^ whether $^S_24^ is a maximum or a minimum (m=2 only) Also see -------- STB: STB-21: sg26, STB-22: sg26.1, STB-25: sg26.3 On-line: help for @fpgraph@, @fpgen@, @fpx@