[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Dupont, William" <william.dupont@vanderbilt.edu> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: Restricted cubic splines |

Date |
Fri, 12 Nov 2004 14:41:54 -0600 |

Statalisters We have recently posted an ado file on SSC to calculate restricted cubic splines. Description ----------- rc_spline creates variables that can be used for regression models in which the linear predictor f(xvar) is assumed to equal a restricted cubic spline function of an independent variable xvar. In these regressions, the user explicitly or implicitly specifies k knots located at xvar = t1, t2, ..., tk. f(xvar) is defined to be a continuous smooth function that is linear before t1, is a piecewise cubic polynomial between adjacent knots, and is linear after tk. See Harrell (2001) for additional details. rc_spline creates variables called _Sxvar1, _Sxvar2, ..., _Sxvar(k-1), where "xvar" is the input variable name. There are always one fewer variables created than there are knots. If the model has k parameters beta0, beta1, ... , beta(k-1) then f(xvar) = beta0 + beta1*_Sxvar1 + beta2*_Sxvar2 + ... + beta(k-1)*_Sxvar(k-1). An important aspect of restricted cubic splines is that the variables _Sxvar1, ... , _Sxvar(k-1) are functions of xvar and the knots only and are not affected by the response variable. This means that we can use rc_spline to define the _Sxvar* variables before specifying the response variable or the type of regression model. Restricted cubic splines are also called natural splines. Options ------- nknots specifies the number of knots. knots specifies the exact location of the knots. The values of these knots must be given in increasing order. If both of these options are given they must both specify the same number on knots. When knots is omitted the default knot values are chosen according to Table 2.3 of Harrell (2001) with the additional restriction that the smallest knot may not be less than the 5th smallest value of xvar and the largest knot may not be greater than the 5th largest value of xvar. The values of the all knots are displayed. When knots is omitted the number of knots specified by nknots must be between 3 and 7. The default number of knots when neither nknots nor knots is given is 5. Frequency weights are allowed. Examples -------- * * Perform a linear regression of y against a restricted * cubic spline (RCS) function of x with 5 knots. * . rc_spline x . regress y _Sx1 _Sx2 _Sx3 _Sx4 * * Perform a logistic regression of fate against * the RCS function of x defined above. * . logistic fate _S* * * Perform a linear regression of y against a RCS of x with 3 knots chosen * at their default values according to Harrell (2001). Graph the observed * and expected values of y against x * . drop _S* . rc_spline x, nknots(3) . regress y _S* . predict yhat . scatter y x || line yhat x * * Perform a proportional hazard regression analysis of fate against a RCS * function of x with four knots specified at x = 2, 4, 6 and 8. * . drop _S* . stset time, failure(fate) . rc_spline x, knots(2 (2) 8) . stcox _S* Remarks ------- Restricted cubic splines provide a fairly general and robust approach for adapting linear methods to model non-linear relationships between a response variable and one or more continuous covariates. They can often be used effectively as an alternative to converting continuous to categorical variables, which results in the discarding of information. See Harrell (2001) for arguments in favor of this approach and guidance on how to build models with RCSs. This program is similar to spline (Sasieni 1994). It differs in the choice of default knots and in its output. spline requires the user to specify a response and independent variable. It then allows the user to specify a number of different regression models and version 7 graphs. In contrast, rc_spline only calculates the RCS covariates. However, this allows the use of the full range and power of Stata's regression, post-estimation and v.8 graph commands. In particular, more sophisticated residual analyses and graphs can be generated as well as multiple regression models involving more than one independent variable. See also mkspline for fitting models involving linear splines. Authors ------- William D. Dupont W. Dale Plummer, Jr. Department of Biostatistics Vanderbilt University School of Medicine Nashville, TN 37232-2158 e-mail: william.dupont@vanderbilt.edu dale.plummer@vanderbilt.edu References ---------- Harrell, F.E: Regression Modeling Strategies with Applications to Linear Models, Logistic Regression and Survival Analysis. New York: Springer-Verlag 2001. Sasieni, P: Natural cubic splines STB reprints. 1994; 4: 19-22. See also STB reprints 1995; 4:174, and package snp7_1 from http://www.stata.com/stb/stb24. William D. Dupont phone: 615-322-2001 URL http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/WilliamDupont * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Keywords in programming** - Next by Date:
**st: variance decomposition** - Previous by thread:
**st: Keywords in programming** - Next by thread:
**st: variance decomposition** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |