Nonparametric series regression

Order

Watch video demo

<- See Stata's other features

Highlights

Nonparametric series regression

Discrete and continuous covariates
B-spline, natural spline, and polynomial basis functions
Estimates of average derivatives and contrasts

Additively separable nonparametric model
Semiparametric regression models
Optimal knot and polynomial selection

Cross-validation
Generalized cross-validation
AIC
BIC
Mallows's Cp

Interface to margins

Estimates of population and subpopulation means and effects
Fully conditional means and effects
Confidence intervals
Graphs via marginsplot

See more nonparametric features

Nonparametric series regression (NPSR) estimates mean outcomes for a given set of covariates, just like linear regression. Unlike linear regression, NPSR is agnostic about the functional form of the outcome in terms of the covariates, which means that NPSR is not subject to misspecification error.

In NPSR, you specify the dependent variable and its determinants. NPSR determines the functional form. If you type

. npregress series y x1 x2 x3

you are specifying that

\( y = g(x_1,x_2,x_3) + \epsilon \)

You are placing no functional-form restrictions on \( g()\). \( g()\) is not required to be linear, although it could be.

\( g(x_1, x_2, x_3)=\beta_1x_1+ \beta_2x_2+ \beta_3x_2 \)

\( g()\) is not required to be linear in the parameters, although it could be.

\( g(x_1, x_2, x_3)=\beta_1x_1+ \beta_2x_2^2+ \beta_3x_1^3x_2+ \beta_4x_3 \)

Or \( g()\) could be

\( g(x_1, x_2, x_3) = \beta_1 x_1^{\beta_2} + \) cos\((x_2 x_3) + \epsilon \)

Or \( g()\) could be anything else you can imagine.

The jargon for this is that \( g()\) is fully nonparametric.

What you specify does not have to be fully nonparametric. You can impose structure. Type

. npregress series y x1 x2 x3, nointeract(x3)

and you are specifying

\( y = g_1(x_1, x_2) + g_2(x_3) + \epsilon \)

Type

. npregress series y x1 x2 x3, nointeract(x2 x3)

and you are specifying

\( y = g_1(x_1) + g_2(x_2) + g_3(x_3) + \epsilon \)

Type

. npregress series y x1 x2, asis(x3)

and you are specifying

\( y = g_1(x_1, x_2) + β_3 x_3 + \epsilon \)

You specify how general—how nonparametric—the model is that you want to fit.

The fitted model is not returned in algebraic form. In fact, the function is never even found in algebraic form. It is approximated by a series, and you can choose polynomial series, natural spline series, or a B-spline series. npregress series reports

average marginal effects for continuous covariates
contrasts for discrete covariates

npregress series needs more observations than linear regression to produce consistent estimates, and the number of observations required grows with the number of covariates and the complexity of \( g()\).

Let's see it work

We have fictional data on wine output from 512 wine-producing counties around the world. output will be our dependent variable. We believe output is affected by

taxlevel	taxes on wine production
rainfall	rainfall in mm/hour
irrigate	whether winery irrigates

Our main interest is to see how tax levels affect wine yield, and we include rainfall and irrigate as controls so that the effect of taxlevel is correctly measured.

We start by fitting the model.

. npregress series output taxlevel rainfall i.irrigate

Computing approximating function

Minimizing cross-validation criterion

Iteration 0:  Cross-validation criterion =  109.7216

Computing average derivatives

Cubic B-spline estimation                  Number of obs      =            512
Criterion: cross-validation                Number of knots    =              1



                             Robust
      output       Effect   std. err.      z    P>|z|     [95% conf. interval]

    taxlevel    -296.8132    14.2256   -20.86   0.000    -324.6949   -268.9316
    rainfall     53.45136   9.427198     5.67   0.000     34.97439    71.92833
    
    irrigate   
   (1 vs 0)       8.40677   1.022549     8.22   0.000     6.402611    10.41093

Note: Effect estimates are averages of derivatives for continuous covariates
      and averages of contrasts for factor covariates.

The output reports effects of -297, 53, and 8.4 for taxlevel, rainfall, and irrigate

Start with the -297. taxlevel is a continuous variable, so -297 is a "average marginal effect", meaning it is the average derivative of output with respect to taxlevel. Said differently, the marginal effect is what economists would call the average marginal effect of taxes on output. Higher taxes result in lower output.

Now consider the 53, which is also an average marginal effect because rainfall is a continuous variable. Higher rainfall increases wine output.

Finally, there is the 8.4, which is a contrast because irrigate is a factor (dummy) variable. irrigate is 1 if the wine grower irrigates and 0 otherwise. The contrast of 8.4 is the average effect for a discrete change. It is the difference of what the mean output would be if all producers irrigated and what the mean output would be if no producers irrigated. 8.4 means a positive treatment effect of irrigation.

Do these estimated effects answer your research question? They might, but if they do not, we can obtain whatever estimated effects we need using Stata's margins command. If we need to explore the effects of various tax levels, say between 11 and 29 percent, we can type

. margins, at(taxlevel=(.11(.03).29))
   (output omitted)

It produces a table of effects and standard errors that we omitted because we want to show the result graphically, which we do simply by typing marginsplot after producing a table using margins.

. marginsplot, xtitle("Tax level")

Variables that uniquely identify margins: taxlevel

The effect of taxes is not linear.

You are not restricted to exploring the function one variable at a time. You could investigate the mean output for different levels of taxes and irrigation by typing

. margins irrigate, at(taxlevel=(.11(.03).29))

You could investigate mean output for different levels of taxes and irrigation and rainfall by typing

. margins irrigate, at(taxlevel=(.11(.03).29)) at(rainfall=(.01(.05).33))

Take a minute to appreciate this. We believe that wine output is a function of taxes, rainfall, and irrigation, but we do not know the function. We can nonetheless fit an approximation to the unknown function and explore it to gain statistical insight using npregress series, margins, and marginsplot.

Tell me more

Learn more about Stata's nonparametric methods features.

Read more about nonparametric series regression in the Base Reference Manual; see [R] npregress intro and [R] npregress series.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


		Robust
output		Effect std. err. z P>\|z\| [95% conf. interval]

taxlevel		-296.8132 14.2256 -20.86 0.000 -324.6949 -268.9316
rainfall		53.45136 9.427198 5.67 0.000 34.97439 71.92833

irrigate
(1 vs 0)		8.40677 1.022549 8.22 0.000 6.402611 10.41093

Nonparametric series regression

<- See Stata's other features

Highlights

Let's see it work

Tell me more

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies