**[R] mkspline** -- Linear and restricted cubic spline construction

__Syntax__

Linear spline with knots at specified points

**mkspline** *newvar_1* *#1* [*newvar_2* *#2* [*...*]] *newvar_k* **=** *oldvar* [*if*] [*in*]
[**,** __m__**arginal** __di__**splayknots**]

Linear spline with knots equally spaced or at percentiles of data

**mkspline** *stubname* *#* **=** *oldvar* [*if*] [*in*] [*weight*] [**,** __m__**arginal** __p__**ctile**
__di__**splayknots**]

Restricted cubic spline

**mkspline** *stubname* **=** *oldvar* [*if*] [*in*] [*weight*] **, cubic** [__nk__**nots(***#***)**
__k__**nots(***numlist***)** __di__**splayknots**]

**fweight**s are allowed with the second and third syntax; see weight.

__Menu__

**Data > Create or change data > Other variable-creation commands >** **Linear**
**and cubic spline construction**

__Description__

**mkspline** creates variables containing a linear spline or a restricted
cubic spline of an existing variable. For linear splines, knots can be
user specified, equally spaced over the range of the variable, or placed
at percentiles. For restricted cubic splines, also known as natural
splines, knot locations are based on Harrell's (2001) recommended
percentiles or user-specified points.

__Options__

+---------+
----+ Options +----------------------------------------------------------

**marginal** is allowed with the first or second syntax. It specifies that
the new variables be constructed so that, when used in estimation,
the coefficients represent the change in the slope from the preceding
interval. The default is to construct the variables so that, when
used in estimation, the coefficients measure the slopes for the
interval.

**displayknots** displays the values of the knots that were used in creating
the linear or restricted cubic spline.

**pctile** is allowed only with the second syntax. It specifies that the
knots be placed at percentiles of the data rather than being equally
spaced over the range.

**nknots(***#***)** is allowed only with the third syntax. It specifies the number
of knots that are to be used for a restricted cubic spline. This
number must be between 3 and 7 unless the knot locations are
specified using **knots()**. The default number of knots is 5.

**knots(***numlist***)** is allowed only with the third syntax. It specifies the
exact location of the knots to be used for a restricted cubic spline.
The values of these knots must be given in increasing order. When
this option is omitted, the default knot values are based on
Harrell's recommended percentiles with the additional restriction
that the smallest knot may not be less than the fifth-smallest value
of *oldvar* and the largest knot may not be greater than the
fifth-largest value of *oldvar*. If both **nknots()** and **knots()** are
given, they must specify the same number of knots.

__Examples__

Fit a regression of log income on education and age by using a piecewise
linear function for age
**. webuse mksp1**
**. mkspline age1 20 age2 30 age3 40 age4 50 age5 60 age6 = age**
**. regress lninc educ age1-age6**

Fit the model so that the coefficients on the spline variables represent
the change in slope from the preceding group
**. webuse mksp1, clear**
**. mkspline age1 20 age2 30 age3 40 age4 50 age5 60 age6 = age,**
**marginal**
**. regress lninc educ age1-age6**

Create variables containing a linear spline of dosage with knots chosen
so that data are divided into five groups of equal size
**. webuse mksp2, clear**
**. mkspline dose 5 = dosage, pctile**
**. logistic outcome dose1-dose5**

Perform a logistic regression of outcome against a restricted cubic
spline function of dosage with four knots chosen according to Harrell's
recommended percentiles
**. webuse mksp2, clear**
**. mkspline dose = dosage, cubic nknots(4)**
**. logistic outcome dose***

__Stored results__

**mkspline** stores the following in **r()**:

Scalars
**r(N_knots)** number of knots

Matrices
**r(knots)** location of knots

__Reference__

Harrell, F. E., Jr. 2001. *Regression Modeling Strategies: With*
*Applications to Linear Models, Logistic Regression, and Survival*
*Analysis*. New York: Springer.