Nonparametric series regression

Discrete and continuous covariates

B-spline, natural spline, and polynomial basis functions

Estimates of average derivatives and contrasts

Additively separable nonparametric model

Semiparametric regression models

Optimal knot and polynomial selection

Cross-validation

Generalized cross-validation

AIC

BIC

Mallows's Cp

Interface to margins

Estimates of population and subpopulation means and effects

Fully conditional means and effects

Confidence intervals

Graphs via

**marginsplot**

Nonparametric series regression (NPSR) estimates mean outcomes for a given set of covariates, just like linear regression. Unlike linear regression, NPSR is agnostic about the functional form of the outcome in terms of the covariates, which means that NPSR is not subject to misspecification error.

In NPSR, you specify the dependent variable and its determinants. NPSR determines the functional form. If you type

.npregress series y x1 x2 x3

you are specifying that

\( y = g(x_1,x_2,x_3) + \epsilon \)

You are placing no functional-form restrictions on \( g()\). \( g()\) is not required to be linear, although it could be.

\( g(x_1, x_2, x_3)=\beta_1x_1+ \beta_2x_2+ \beta_3x_2 \)

\( g()\) is not required to be linear in the parameters, although it could be.

\( g(x_1, x_2, x_3)=\beta_1x_1+ \beta_2x_2^2+ \beta_3x_1^3x_2+ \beta_4x_3 \)

Or \( g()\) could be

\( g(x_1, x_2, x_3) = \beta_1 x_1^{\beta_2} + \) cos\((x_2 x_3) + \epsilon \)

Or \( g()\) could be anything else you can imagine.

The jargon for this is that \( g()\) is fully nonparametric.

What you specify does not have to be fully nonparametric. You can impose structure. Type

.npregress series y x1 x2 x3, nointeract(x3)

and you are specifying

\( y = g_1(x_1, x_2) + g_2(x_3) + \epsilon \)

Type

.npregress series y x1 x2 x3, nointeract(x2 x3)

and you are specifying

\( y = g_1(x_1) + g_2(x_2) + g_3(x_3) + \epsilon \)

Type

.npregress series y x1 x2, asis(x3)

and you are specifying

\( y = g_1(x_1, x_2) + β_3 x_3 + \epsilon \)

You specify how general—how nonparametric—the model is that you want to fit.

The fitted model is not returned in algebraic form. In fact, the
function is never even found in algebraic form. It is
approximated by a series, and you can choose polynomial series,
natural spline series, or a B-spline series. **npregress series**
reports

average marginal effects for continuous covariates

contrasts for discrete covariates

**npregress series** needs more observations than linear regression to
produce consistent estimates, and the number of observations required
grows with the number of covariates and the complexity of \( g()\).

We have fictional data on wine output from 512 wine-producing
counties around the world. **output** will be our
dependent variable. We believe **output** is affected by

taxlevel | taxes on wine production |

rainfall | rainfall in mm/hour |

irrigate | whether winery irrigates |

Our main interest is to see how tax levels affect wine yield,
and we include **rainfall** and **irrigate** as controls so
that the effect of **taxlevel** is correctly measured.

We start by fitting the model.

.npregress series output taxlevel rainfall i.irrigateComputing approximating function Minimizing cross-validation criterion Iteration 0: Cross-validation criterion = 109.7216 Computing average derivatives Cubic B-spline estimation Number of obs = 512 Criterion: cross-validation Number of knots = 1

Robust | ||||

output | Effect std. err. z P>|z| [95% conf. interval] | |||

taxlevel | -296.8132 14.2256 -20.86 0.000 -324.6949 -268.9316 | |||

rainfall | 53.45136 9.427198 5.67 0.000 34.97439 71.92833 | |||

irrigate | ||||

(1 vs 0) | 8.40677 1.022549 8.22 0.000 6.402611 10.41093 | |||

The output reports effects of -297, 53, and 8.4 for **taxlevel**, **rainfall**, and **irrigate**

Start with the -297. **taxlevel** is a continuous variable, so
-297 is a "average marginal effect", meaning it is the average
derivative of **output** with respect to **taxlevel**. Said
differently, the marginal effect is what economists would call the
average marginal effect of taxes on output. Higher taxes result in
lower output.

Now consider the 53, which is also an average marginal effect because
**rainfall** is a continuous variable. Higher rainfall
increases wine output.

Finally, there is the 8.4, which is a contrast because
**irrigate** is a factor (dummy) variable. **irrigate** is 1
if the wine grower irrigates and 0 otherwise. The contrast of 8.4 is
the average effect for a discrete change. It is the difference of
what the mean output would be if all producers irrigated and what
the mean output would be if no producers irrigated. 8.4 means a
positive treatment effect of irrigation.

Do these estimated effects answer your research question? They might,
but if they do not, we can obtain whatever estimated effects we need
using Stata's **margins** command.
If we need to explore the effects of various tax levels, say between
11 and 29 percent, we can type

.margins, at(taxlevel=(.11(.03).29))(output omitted)

It produces a table of effects and standard errors that we omitted
because we want to show the result graphically, which we do simply
by typing **marginsplot** after producing a table using **margins**.

.marginsplot, xtitle("Tax level")Variables that uniquely identify margins: taxlevel

The effect of taxes is not linear.

You are not restricted to exploring the function one variable
at a time. You could investigate the mean output for different
levels of taxes *and* irrigation by typing

.margins irrigate, at(taxlevel=(.11(.03).29))

You could investigate mean output for different levels of taxes
*and* irrigation *and* rainfall by typing

.margins irrigate, at(taxlevel=(.11(.03).29)) at(rainfall=(.01(.05).33))

Take a minute to appreciate this. We believe that wine output is a
function of taxes, rainfall, and irrigation, but we do not know the
function. We can nonetheless fit an approximation to the unknown
function and explore it to gain statistical insight using
**npregress series**, **margins**, and **marginsplot**.

Learn more about Stata's nonparametric methods features.

Read more about nonparametric series regression in the *Base Reference Manual*; see **[R] npregress intro** and **[R] npregress series**.