.- help for ^mfracpol^, ^mfplot^ (STB-43: sg81) .- Multivariable fractional polynomial models ------------------------------------------ ^mfracpol^ regression_cmd yvar xvarlist [^in^ range] [^if^ exp] [^weight^] [^,^ ^al^pha^(^alpha_list^)^ ^cat^zero^(^cvarlist^)^ ^cyc^les^(^#^)^ ^df(^df_list^)^ ^po^wers^(^power_list^)^ ^sel^ect^(^select_list^)^ ^zer^o^(^varlist^)^ regression_cmd_options ] ^mfplot^ xvar [^,^ graph_options ] Description ----------- ^mfracpol^ selects the fractional polynomial (FP) model which best predicts the outcome variable, yvar, from the RHS variables, xvarlist. The type of model is defined by regression_cmd. regression_cmd may be any of @regress@/@fit@, @logit@/@logistic@, @cox@, @clogit@, @glm@, @poisson@, @xtgee@. ^mfracpol^, typed without arguments, redisplays the summary of the final model. ^mfplot^ plots the fractional polynomial transformation for xvar where xvar is a member of xvarlist selected in the final model. If an xvar is not selected, it cannot be plotted. Partial residuals are included on the plot. See help @fracpoly@ for further details of fractional polynomial modelling. Options ------- ^alpha(^alpha_list^)^ sets the significance levels for testing between FP models of different degree. The rules for alpha_list are the same as for df_list in the ^df()^ option (see below). The default selection level is 0.05 for all variables. Example: ^alpha(0.01)^ [All variables have FP selection level 1%] Example: ^alpha(0.05, weight:0.1)^ [All variables except ^weight^ have FP selection level 5%, ^weight^ has level 10%] ^catzero(^cvarlist^)^: for details see "Zeroes and zero categories" under Remarks below. cvarlist must be a subset of xvarlist. ^cycles(^#^)^ is the maximum number of iteration cycles permitted. Default: 5. ^df(^df_list^)^ sets up the degrees of freedom (df) for each predictor. The df (not counting the regression constant, ^_cons^) are twice the degree of the FP, so for example an xvar fitted as a second-degree FP (m = 2) has 4 df. The first item in df_list may be either # or ^:^#. Subsequent items must be ^:^#. Items are separated by commas and is specified in the usual way for variables. With the first type of item, the df for all predictors are taken to be #. With the second type of item, all members of (which must be a subset of xvarlist) have # df. Example: ^df(4)^. [All variables have 4 df.] Example: ^df(2, weight displ:4)^. [^weight^ and ^displ^ have 4 df, all other variables have 2 df.] Example: ^df(weight displ:4, mpg:2)^. [^weight^ and ^displ^ have 4 df, ^mpg^ has 2 df, all other variables have the default of 1 df.] Example: ^df(weight displ:4, 2)^. [All variables have 2 df since the final 2 overrides the earlier 4.] Default: 1 df for all predictors. ^powers(power_list)^ is the set of fractional polynomial powers to be used. The default set is power_list = {-2 -1 -0.5 0 0.5 1 2 3} (0 giving log). ^addpowers(^addlist^)^ appends the powers in addlist to power_list. ^select(^select_list^)^ sets the significance levels for variable selection. A variable is dropped if its removal causes a non-significant increase in deviance at the relevant selection level. The rules for select_list are the same as for df_list in the ^df()^ option (see above). The default selection level of 1 for all variables forces them all into the model. Setting selection level 1 for a given variable forces it into the model. Example: ^select(0.05)^ [All variables have selection level 5%] Example: ^select(0.05, weight:1)^ [All variables except ^weight^ have selection level 5%, ^weight^ is forced into the model] ^zero(^varlist^)^ treats negative and zero values of members of varlist as zero when FP transformations are applied. By default, such variables are subjected to a preliminary linear transformation to avoid negative and zero values (see @fracpoly@). varlist must be a subset of xvarlist. graph_options are any of the standard Stata ^graph, twoway^ options. regression_cmd_options may be any of the options appropriate to regression_cmd, such as ^dead(^deadvar^)^ for ^cox^. Remarks on ^mfracpol^ ------------------- Fitting algorithm ================= The fitting algorithm in ^mfracpol^ processes the xvars sequentially. Initially, the program silently arranges xvarlist in increasing order of P-value (decreasing statistical significance) in the multiple regression model consisting of xvarlist with each term linear. The aim is to model relatively unimportant variables after important variables, which may help to reduce potential model-fitting difficulties caused by collinearity (or in the case of FP functions, `concurvity') among the predictors. Different fitting orders, which may occasionally result in different final models, may be chosen using the ^xorder()^ option. ^mfracpol^ creates new variables corresponding to the FP functions chosen in the final model. Each variable is labelled according to its chosen power(s), using the first 6 characters of the name followed by _1 (for the m=1 power or the first m=2 power) or _2 (the second m=2 power). Each of the xvars must therefore be uniquely identified by the first 6 characters of its name. An error will result if this condition is not fulfilled. The names of variables in the final model are stored in the global macro ^$S_1^. Zeroes and zero categories ========================== The ^zero()^ option permits fitting an FP model to the positive values of a covariate, taking non-positive values as zero. An application is the assessment of the effect of cigarette smoking as a risk factor in an epidemiological study. Non-smokers may be qualitatively different from smokers, so the effect of smoking (regarded as a continuous variable) may not be continuous between one and zero cigarettes. To allow for this the risk may be modelled as constant for the non-smokers and an FP function of the number of cigarettes for the smokers: . ^gen byte nonsmokr = cond(n_cigs==0, 1, 0) if n_cigs != .^ . ^mfracpol logit case n_cigs nonsmokr age, zero(n_cigs) df(4,nonsmokr:1)^ Omission of ^zero(n_cigs)^ would cause ^n_cigs^ to be transformed before analysis by the addition of a suitable constant, probably 1. A closely related approach involves the ^catzero()^ option. The command . ^mfracpol logit case n_cigs age, catzero(n_cigs)^ would achieve a similar result to the previous command, but with important differences. First, ^mfracpol^ would create the equivalent of the binary variable ^nonsmokr^ (now called ^n_cigs_0^) automatically and include it in the model. Second, the two smoking variables (^n_cigs^ and ^n_cigs_0^) would be linked and treated as a single predictor in the model. With the ^select^ option for variable selection active, they would be tested simultaneously for inclusion in the model. A degree of freedom would be allocated to ^n_cigs_0^ in addition to those allowed for fractional polynomial transformation of ^n_cigs^. Technical Note: Method of model selection ========================================= At the initial cycle, the best-fitting FP function for xvar1 (the first of xvarlist) is determined, with all the other variables assumed linear. All significance tests are carried out using an approximate P-value calculation based on a difference in deviances (-2 x log likelihood) having a chi-squared or F distribution, depending on the regression in use. The best m=2 model is tested against the alternative of eliminating the variable from the model, a test which has 4 df. If the test is not significant, the variable is (temporarily) dropped. If it is significant, the best m=2 model is tested against the best m=1 model, the latter being chosen if not significantly worse. Finally the best m=1 model is tested against a straight line, the latter being chosen if the test is non-significant, the former otherwise. The functional form (but NOT the estimated regression coefficients) for xvar1 is kept, and the process is repeated for xvar2, xvar3, etc. The first iteration concludes when all the variables have been processed in this way. The next cycle is similar, except that the functional forms from the initial cycle are retained for all variables excepting the one currently being processed. A variable whose functional form is assumed linear is processed similarly, except that a test with 1 df is applied to determine if the variable is to be included in the model or dropped. Updating of FP functions and candidate variables continues until the functions and variables included in the overall model do not change (convergence). Convergence is usually achieved within 1-4 cycles. Remarks on ^mfplot^ ----------------- ^mfplot^ actually produces a component-plus-residual plot. For normal-error models with constant weights and a single covariate, this amounts to a plot of the observations with the fitted line inscribed. For other normal-error models, weighted residuals are calculated and added to the fitted values. For models with additional covariates, the line is the partial linear predictor corresponding to the specified xvar. The values are adjusted so that their mean is the same as the mean of the fitted values from the complete model. For logistic models, the fitted values are plotted as logits. Deviance residuals are calculated and are added to the (partial) linear predictor to give component-plus-residual values. These are are plotted as small circles. For Cox and GEE models, only the linear predictor (a.k.a. index) is plotted. Examples -------- . ^mfracpol regress mpg weight displ foreign, df(4, foreign:1)^ . ^mfracpol regress mpg weight displ foreign, df(1, weight displ:4)^ . ^mfracpol regress mpg weight displ foreign, df(2, foreign:1)^ ^select(0.05, foreign:1) alpha(0.1) xorder(n)^ . ^mfplot displ^ Saved Results ------------- ^mfracpol^ saves in the ^$S_^# macros: ^S_1^ names of variables in final model, including fractional polynomial-transformed variables if any ^S_2^ deviance of the final model. ^mfracpol^ also saves numerous quantities in ^$S_E_^ macros. Authors ------- Patrick Royston Imperial College School of Medicine, UK proyston@@rpms.ac.uk Gareth Ambler Imperial College School of Medicine, UK gambler@@rpms.ac.uk Also see -------- STB: STB-43 sg81 Manual: [R] ^fracpoly^ On-line: ^help^ for @clogit@, @cox@, @fit@, @glm@, @logistic@, @logit@, @poisson@, @regress@, @xtgee@.