Home  /  Products  /  Stata 18  /  New spline functions

<- See Stata 18's new features

Highlights

  • Generate spline basis functions for multiple variables at once

  • B-spline basis functions

  • Piecewise polynomial basis functions

  • Restricted cubic spline basis functions

  • Select the number of knots, provide a knot list, or use a knots matrix

Often, we do not want to make functional form assumptions about the data we analyze. We may want to fit a regression of an outcome on a set of regressors and be agnostic about the functional form of the regressors. Spline basis functions are flexible approximations to the functional form of the regressors. We may also want to visualize the relationship between an outcome and a regressor or between variables. We may use splines to visualize this relationship without claiming linearity or other functional forms.

In Stata 18, you can use the new makespline command to generate B-spline, piecewise polynomial spline, and restricted cubic spline basis functions from a list of existing variables. For example, we could type

. makespline bspline x1 x2 x3 x4 ...x100

to form 100 third-order B-spline basis functions, one for each variable from x1 to x100. We can now use any of the basis functions to fit a model and be agnostic about the relationship of the covariates and an outcome of interest. Or we could visualize the relationship of the outcome of interest and any of the basis function components that makespline generated.

Let's see it work

We would like to see the effect of mothers smoking (mbsmoke) on an infant's birthweight (bweight) using the telasso command. The telasso command lets us model both the outcome (bweight) and the treatment (mbsmoke). We believe that there is a relationship between birthweight and the mother's age (mage), mother's educational attainment (medu), and father's educational attainment (fedu). We also believe that medu is a good predictor of whether a mother smokes during pregnancy.

We are agnostic about the functional form for the relationship of bweight and mage, medu, and fedu. We are also agnostic about the relationship between mbsmoke and medu. This does not matter to telasso. The command selects from a set of candidate covariates and estimates the treatment effect of interest.

We use makespline to form basis functions from each of the covariates of interest.

. makespline bspline mage medu fedu

We generated third-order B-spline basis functions, each consisting of five variables, from mage, medu, and fedu. The variables generated have generic system names, starting with _bsp. If you prefer, you can change the basis names using the basis() option. Below, we show the generated variables:

. describe _bsp*
Variable Storage Display Value
name type format label Variable label
_bsp_1_1 double %10.0g B-spline basis term 1 for mage
_bsp_1_2 double %10.0g B-spline basis term 2 for mage
_bsp_1_3 double %10.0g B-spline basis term 3 for mage
_bsp_1_4 double %10.0g B-spline basis term 4 for mage
_bsp_1_5 double %10.0g B-spline basis term 5 for mage
_bsp_2_1 double %10.0g B-spline basis term 1 for medu
_bsp_2_2 double %10.0g B-spline basis term 2 for medu
_bsp_2_3 double %10.0g B-spline basis term 3 for medu
_bsp_2_4 double %10.0g B-spline basis term 4 for medu
_bsp_2_5 double %10.0g B-spline basis term 5 for medu
_bsp_3_1 double %10.0g B-spline basis term 1 for fedu
_bsp_3_2 double %10.0g B-spline basis term 2 for fedu
_bsp_3_3 double %10.0g B-spline basis term 3 for fedu
_bsp_3_4 double %10.0g B-spline basis term 4 for fedu
_bsp_3_5 double %10.0g B-spline basis term 5 for fedu

The B-spline basis function components from mage start with_bsp_1, from medu with _bsp_2, and from fedu with _bsp_3. Using these basis functions, we fit the treatment-effects model:

. telasso (bweight c._bsp_1*##c._bsp_2* _bsp_3*) (mbsmoke _bsp_2*)

bweight is an arbitrary function of the interaction (specified by using ##) of the basis functions for mage and medu and of the basis function for father's education. mbsmoke is an arbitrary function of the basis function for mother's education. Below are the results:

. telasso (bweight c._bsp_1*##c._bsp_2* _bsp_3*) (mbsmoke _bsp_2*) 

Treatment-effects lasso estimation     Number of observations      =      4,642
Outcome model:   linear                Number of controls          =         40
Treatment model: logit                 Number of selected controls =          5

Robust
bweight Coefficient Std. err. z P>|z| [95% conf. interval]
ATE
mbsmoke
(Smoker vs Nonsmoker) -262.7927 25.19126 -10.43 0.000 -312.1667 -213.4188
POmean
mbsmoke
Nonsmoker 3409.448 9.377197 363.591 0.000 3391.069 3427.827

The basis function variables created by makespline and their interactions produced 40 potential control variables. telasso selected 5 of those controls and used them to compute a treatment effect of –263 grams. In other words, the birthweight of babies would be 263 grams less if all mothers smoked relative to the counterfactual in which no mother smoked.

Tell me more

Read more in the Stata Base Reference Manual; see [R] makespline.

View all the new features in Stata 18.

Made for data science.

Get started today.