.- help for ^boxtid^ (STB-49: sg112) .- Box-Tidwell and exponential regression models --------------------------------------------- ^boxtid^ regression_cmd yvar xvarlist [weight] [^if^ exp] [^in^ range] [ ^,^ ^adj^ust^(^adj_list^)^ ^df(^df_list^)^ ^exp^on^(^varlist^)^ ^dfd^efault^(^#^)^ ^in^it^(^init_list^)^ ^it^er^(^#^)^ ^ltol^erance^(^#^)^ ^pow^ers^(^numlist^)^ ^tr^ace ^zer^o^(^varlist^)^ regression_cmd_options ] where regression_cmd may be @cox@, @glm@, @logistic@, @logit@, @poisson@, or @regress@ and where adj_list is a comma-separated list with elements varlist^:^{^mean^|#|^no^} except that the first element may optionally be of the form {^mean^|#|^no^} to specify the default for all variables. ^boxtid^ shares the features of all estimation commands; see help @est@. @fracplot@ may be used following ^mfracpol^ to show plots of fitted values and partial residuals. @fracpred@ may be used for prediction. All weight types supported by regression_cmd are allowed; see help @weights@. Description ----------- ^boxtid^ is a generalization of @fracpoly@ in which continuous rather than fractional powers are allowed. ^boxtid^ fits Box & Tidwell's (1962) power transformation model with predictors in xvarlist to yvar. The model function for each xvar in xvarlist is b1 * xvar^^p1 + b2 * xvar^^p2 ... ^boxtid^ also fits exponential models for predictors specified in ^expon()^. The model function for each such xvar in xvarlist is b1 * exp(p1 * xvar) + b2 * exp(p2 * xvar) ... The quantities p1, p2, ... are real numbers. After execution, ^boxtid^ leaves variables in the data named ^I^xvar^__1^, ^I^xvar^_p1^, ^I^xvar^__2^, ^I^xvar^_p2^, ..., where xvar represents the first four letters of the name of xvar1. The new variables contain the best-fitting transformed xvars and an auxiliary variable for each predictor. Options ------- ^adjust(^adj_list^)^ defines the adjustment for the covariates xvar1, xvar2, ..., xvarlist. The default is ^adjust(mean)^, except for binary covariates where it is ^adjust(^#^)^, # being the lower of the two distinct values of the covariate. A typical item in adj_list is varlist^:^{^mean^|#|^no^}. Items are separated by commas. The first item is special in that varlist^:^ is optional, and if omitted, the default is (re)set to the specified value (^mean^ or # or ^no^). For example, ^adjust(no, age:mean)^ sets the default to ^no^ and adjustment for ^age^ to ^mean^. ^df(^df_list^)^ sets up the degrees of freedom (df) for each predictor. Each power and each b (regression coefficient) count as 1 df. Predictors specified to have 1 df are fitted as linear terms in the model. The first item in df_list may be either # or ^:^#. Subsequent items must be ^:^#. Items are separated by commas and is specified in the usual way for variables. With the first type of item, the df for all predictors are taken to be #. With the second type of item, all members of (which must be a subset of xvarlist) have # df. The default df for a predictor (specified in xvarlist but not in df_list) are assigned according to the number of distinct (unique) values of the predictor as follows: # of distinct values default df ----------------------------------------- 1 (not applicable) 2-3 1 4-5 min(2,^dfdefault()^) >=6 ^dfdefault()^ ----------------------------------------- Example: ^df(4)^ [All variables have 4 df.] Example: ^df(2, weight displ:4)^ [^weight^ and ^displ^ have 4 df, all other variables have 2 df.] Example: ^df(weight displ:4, mpg:2)^ [^weight^ and ^displ^ have 4 df, ^mpg^ has 2 df, all other variables have the default of 1 df.] Example: ^df(weight displ:4, 2)^ [All variables have 2 df since the final 2 overrides the earlier 4.] ^dfdefault(^#^)^ determines the default maximum degrees of freedom (df) for a predictor. Default # is 2. ^iter(^#^)^ sets # to be the maximum number of iterations allowed for the fitting algorithm to converge. Default: 100. ^expon(^varlist^)^ specifies that all members of varlist are to be modelled using an exponential function, the default being a power (Box-Tidwell) model. For each xvar (a member of varlist), a multi-exponential model b1 * exp(p1 * xvar) + b2 * exp(p2 * xvar) +... is fitted. ^init(^init_list^)^ sets initial values for the parameters p1, p2, ... of the model. By default these are calculated automatically. The first item in init_list may be either # [# ...] or varlist^:^# [# ...]. Subsequent items must be varlist^:^# [# ...]. Items are separated by commas and varlist is specified in the usual way for variables. If the first item is # [# ...], this becomes the default initial value for all variables, but subsequent items (re)set the initial value for variables in subsequent varlists. If the df for a variable in the model is d (greater than 1) then # # ... consists of d/2 items. Typically d = 2 so that there is just one initial value, #. ^ltolerance(^#^)^ is the maximum difference in deviance between iterations required for convergence of the fitting algorithm. Default: 0.001. ^powers(^powerlist^)^ defines the powers to be used with fractional polynomial initialization for xvarlist (see Remarks). ^trace^ reports the progress of the fitting procedure towards convergence. ^zero(^varlist^)^ indicates transformation of negative and zero values of all xvars in varlist to zero before fitting the model. regression_cmd_options are any of the options available with regression_cmd. Remarks ------- ^boxtid^ finds and reports a multiple regression model comprising the maximum likelihood estimate of p1, p2, ... for each member of xvarlist. The model that is fit depends on the type of regression_cmd used. The fitting procedure is iterative and requires accurate starting values for the powers. ^boxtid^ finds initial values for the p's by fitting a fractional polynomial of the appropriate degree for each xvar in turn, with the remainder linear. This procedure reduces the amount of iteration needed subsequently to obtain maximum likelihood estimates of the p's. The table of output includes for each member of xvarlist a test of whether the relation is linear. That is, it reports the difference in deviance between the continuous-power polynomial model for an xvar and a model linear in xvar, conditional on the remaining xvars being nonlinear. A P-value from a chi-square or F test of the hypothesis of linearity and the estimated linear coefficient for xvar are also given. Appropriate estimates of the standard errors of p1, p2, ... are provided in the table of output, and the standard errors of the corresponding regression coefficients are correctly estimated. This requires the additional terms ln(xvar) * xvar^^p1, ln(xvar) * xvar^^p2, ... to be included in the model. These terms are represented by variables called xvar^p1^, xvar^p2^ etc. The estimated t- or z-values for the coefficients of these terms should be zero to at least 3 decimal places. If they are not zero, then the estimation procedure probably has not converged properly, and the value of ^ltolerance()^ should be reduced. If an xvar has any negative or zero values and the ^expon()^ option is not used, ^boxtid^ behaves exactly like @fracpoly@ in that it subtracts the minimum of xvar from xvar and adds the rounding (or counting) interval. The interval is defined as the smallest positive difference between the ordered values of xvar. After this change of origin, the minimum value of xvar is guaranteed positive. An example of the ^zero^ option is in the assessment of the effect of cigarette smoking on the risk of a disease in an epidemiological study. Since non-smokers may be qualitatively different from smokers, the effect of quantity smoked, regarded as a continuous risk factor, may be discontinuous at zero. The risk may be modelled as a constant for the non-smokers and a polynomial function of the amount smoked for the smokers by including the ^zero()^ option, for example . ^boxtid logit death num_cigs nonsmok, zero(num_cigs)^ Omission of ^zero()^ here would cause ^num_cigs^ to be transformed before analysis by the addition of a suitable constant, probably 1. Convergence of the algorithm is not guaranteed and may be hard to achieve for models with xvars of degree 2 or more. Sometimes a large negative or positive power estimate with an enormous standard error is obtained, a sign that the model may be overparametrized. It is worth trying a lower degree model and noting whether the deviance is significantly reduced (chi-square or F test on 2 df). Examples -------- . ^use auto.dta^ . ^boxtid regress mpg weight^ . ^boxtid regress mpg weight displ foreign^ . ^boxtid regress mpg weight displ foreign, df(weight displ:2, foreign:1)^ . ^boxtid regress mpg displ weight, expon(weight)^ Reference --------- Box GEP, Tidwell PW. 1962. Transformation of the independent variables. Technometrics 4:531-550. Author ------ Patrick Royston Imperial College School of Medicine, UK p.royston@@ic.ac.uk Also see -------- STB: STB-49 sg112 Manual: [R] ^fracpoly^ On-line: help for @clogit@, @cox@, @fit@, @glm@, @logistic@, @logit@, @poisson@, @regress@, @xtgee@.