<- See Stata 18's new features
Highlights
Generate spline basis functions for multiple variables at once
B-spline basis functions
Piecewise polynomial basis functions
Restricted cubic spline basis functions
Select the number of knots, provide a knot list, or use a knots matrix
Often, we do not want to make functional form assumptions about the data we analyze. We may want to fit a regression of an outcome on a set of regressors and be agnostic about the functional form of the regressors. Spline basis functions are flexible approximations to the functional form of the regressors. We may also want to visualize the relationship between an outcome and a regressor or between variables. We may use splines to visualize this relationship without claiming linearity or other functional forms.
In Stata 18, you can use the new makespline command to generate B-spline, piecewise polynomial spline, and restricted cubic spline basis functions from a list of existing variables. For example, we could type
. makespline bspline x1 x2 x3 x4 ...x100
to form 100 third-order B-spline basis functions, one for each variable from x1 to x100. We can now use any of the basis functions to fit a model and be agnostic about the relationship of the covariates and an outcome of interest. Or we could visualize the relationship of the outcome of interest and any of the basis function components that makespline generated.
We would like to see the effect of mothers smoking (mbsmoke) on an infant's birthweight (bweight) using the telasso command. The telasso command lets us model both the outcome (bweight) and the treatment (mbsmoke). We believe that there is a relationship between birthweight and the mother's age (mage), mother's educational attainment (medu), and father's educational attainment (fedu). We also believe that medu is a good predictor of whether a mother smokes during pregnancy.
We are agnostic about the functional form for the relationship of bweight and mage, medu, and fedu. We are also agnostic about the relationship between mbsmoke and medu. This does not matter to telasso. The command selects from a set of candidate covariates and estimates the treatment effect of interest.
We use makespline to form basis functions from each of the covariates of interest.
. makespline bspline mage medu fedu
We generated third-order B-spline basis functions, each consisting of five variables, from mage, medu, and fedu. The variables generated have generic system names, starting with _bsp. If you prefer, you can change the basis names using the basis() option. Below, we show the generated variables:
. describe _bsp*
Variable Storage Display Value | ||
name type format label Variable label | ||
_bsp_1_1 double %10.0g B-spline basis term 1 for mage | ||
_bsp_1_2 double %10.0g B-spline basis term 2 for mage | ||
_bsp_1_3 double %10.0g B-spline basis term 3 for mage | ||
_bsp_1_4 double %10.0g B-spline basis term 4 for mage | ||
_bsp_1_5 double %10.0g B-spline basis term 5 for mage | ||
_bsp_2_1 double %10.0g B-spline basis term 1 for medu | ||
_bsp_2_2 double %10.0g B-spline basis term 2 for medu | ||
_bsp_2_3 double %10.0g B-spline basis term 3 for medu | ||
_bsp_2_4 double %10.0g B-spline basis term 4 for medu | ||
_bsp_2_5 double %10.0g B-spline basis term 5 for medu | ||
_bsp_3_1 double %10.0g B-spline basis term 1 for fedu | ||
_bsp_3_2 double %10.0g B-spline basis term 2 for fedu | ||
_bsp_3_3 double %10.0g B-spline basis term 3 for fedu | ||
_bsp_3_4 double %10.0g B-spline basis term 4 for fedu | ||
_bsp_3_5 double %10.0g B-spline basis term 5 for fedu |
The B-spline basis function components from mage start with_bsp_1, from medu with _bsp_2, and from fedu with _bsp_3. Using these basis functions, we fit the treatment-effects model:
. telasso (bweight c._bsp_1*##c._bsp_2* _bsp_3*) (mbsmoke _bsp_2*)
bweight is an arbitrary function of the interaction (specified by using ##) of the basis functions for mage and medu and of the basis function for father's education. mbsmoke is an arbitrary function of the basis function for mother's education. Below are the results:
. telasso (bweight c._bsp_1*##c._bsp_2* _bsp_3*) (mbsmoke _bsp_2*) Treatment-effects lasso estimation Number of observations = 4,642 Outcome model: linear Number of controls = 40 Treatment model: logit Number of selected controls = 5
Robust | ||
bweight | Coefficient Std. err. z P>|z| [95% conf. interval] | |
ATE | ||
mbsmoke | ||
(Smoker vs Nonsmoker) | -262.7927 25.19126 -10.43 0.000 -312.1667 -213.4188 | |
POmean | ||
mbsmoke | ||
Nonsmoker | 3409.448 9.377197 363.591 0.000 3391.069 3427.827 | |
The basis function variables created by makespline and their interactions produced 40 potential control variables. telasso selected 5 of those controls and used them to compute a treatment effect of –263 grams. In other words, the birthweight of babies would be 263 grams less if all mothers smoked relative to the counterfactual in which no mother smoked.
Read more in the Stata Base Reference Manual; see [R] makespline.
View all the new features in Stata 18.