**[R] npregress** -- Nonparametric regression

__Syntax__

**npregress** **kernel** *depvar* *indepvars* [*if*] [*in*] [**,** *options*]

*options* Description
-------------------------------------------------------------------------
Model
**estimator(linear**|**constant)** use the local-linear or local-constant
kernel estimator
__kern__**el(***kernel***)** kernel density function for continuous
covariates
__dkern__**el(***dkernel***)** kernel density function for discrete
covariates
**predict(***prspec***)** store predicted values of the mean and
derivatives using variable names specified
in *prspec*
__noderiv__**atives** suppress derivative computation
**imaic** use improved AIC instead of cross-validation
to compute optimal bandwidth
__unid__**entsample(***newvar***)** specify name of variable that marks
identification problems

Bandwidth
__bw__**idth(***specs***)** specify kernel bandwidth for all predictions
__meanbw__**idth(***specs***)** specify kernel bandwidth for the mean
__derivbw__**idth(***specs***)** specify kernel bandwidth for the derivatives

SE
* **vce(***vcetype***)** *vcetype* may be **none** or __boot__**strap**
__r__**eps(***#***)** equivalent to **vce(bootstrap, reps(***#***))**
**seed(***#***)** set random-number seed to *#*; must also
specify **reps(***#***)**
**bwreplace** vary bandwidth with each bootstrap
replication; seldom used

Reporting
__l__**evel(***#***)** set confidence level; default is **level(95)**
*display_options* control columns and column formats, row
spacing, line width, display of omitted
variables and base and empty cells, and
factor-variable labeling
**citype(***citype***)** method to compute bootstrap confidence
intervals; default is **citype(**__p__**ercentile)**

Maximization
*maximize_options* control the maximization process

__coefl__**egend** display legend instead of statistics
-------------------------------------------------------------------------
*indepvars* may contain factor variables; see fvvarlist.
**bootstrap**, **by**, and **jackknife** are allowed; see prefix.
* **vce(bootstrap)** reports percentile confidence intervals instead of the
normal-based confidence intervals reported when **vce(bootstrap)** is
specified with other estimation commands.
**coeflegend** does not appear in the dialog box.
See **[R] npregress postestimation** for features available after estimation.

*kernel* Description
-------------------------------------------------------------------------
__ep__**anechnikov** Epanechnikov kernel function; the default
**epan2** alternative Epanechnikov kernel function
__bi__**weight** biweight kernel function
__cos__**ine** cosine trace kernel function
__gau__**ssian** Gaussian kernel function
__par__**zen** Parzen kernel function
__rec__**tangle** rectangle kernel function
__tri__**angle** triangle kernel function
-------------------------------------------------------------------------

*dkernel* Description
-------------------------------------------------------------------------
__li__**racine** Li-Racine kernel function; the default
__cell__**mean** cell means kernel function
-------------------------------------------------------------------------

*citype* Description
-------------------------------------------------------------------------
__p__**ercentile** percentile confidence intervals; the default
**bc** bias-corrected confidence intervals
__nor__**mal** normal-based confidence intervals
-------------------------------------------------------------------------

__Menu__

**Statistics > Nonparametric analysis > Nonparametric regression**

__Description__

**npregress** performs nonparametric local-linear and local-constant kernel
regression. Like linear regression, nonparametric regression models the
mean of the outcome conditional on the covariates, but unlike linear
regression, it makes no assumptions about the functional form of the
relationship between the outcome and the covariates. **npregress** may be
used to model the mean of a continuous, count, or binary outcome.

__Options__

+-------+
----+ Model +------------------------------------------------------------

**estimator(linear**|**constant)** specifies whether the local-constant or
local-linear kernel estimator should be used. The default is
**estimator(linear)**.

**kernel(***kernel***)** specifies the kernel density function for continuous
covariates for use in calculating the local-constant or local-linear
estimator. The default is **kernel(epanechnikov)**.

**dkernel(***dkernel***)** specifies the kernel density function for discrete
covariates for use in calculating the local-constant or local-linear
estimator. The default is **dkernel(liracine)**; see *Methods and*
*formulas* in **[R] npregress** for details on the Li-Racine kernel. When
**dkernel(cellmean)** is specified, discrete covariates are weighted by
their cell means.

**predict(***prspec***)** specifies that **npregress** store the predicted values for
the mean and derivatives of the mean with the specified names.
*prspec* is the following:

**predict(***varlist*|*stub****** [**, replace noderivatives**]**)**

The option takes a variable list or a *stub*. The first variable name
corresponds to the predicted outcome mean. The second name
corresponds to the derivatives of the mean. There is one derivative
for each *indepvar*.

When **replace** is used, variables with the names in *varlist* or *stub******
are replaced by those in the new computation. If **noderivatives** is
specified, only a variable for the mean is created. This will
increase computation speed but will add to the computation burden if
you want to obtain marginal effects after estimation.

**noderivatives** suppresses the computation of the derivatives. In this
case, only the mean function is computed.

**imaic** specifies to use the improved AIC instead of cross-validation to
compute optimal bandwidths.

**unidentsample(***newvar***)** specifies the name of a variable that is 1 if the
observation violates the model identification assumptions and is 0
otherwise. By default, this variable is a system variable
(**_unident_sample**).

**npregress** computes a weighted regression for each observation in our
data. An observation violates identification assumptions if the
regression cannot be performed at that point. The regression
formula, which is discussed in detail in *Methods and formulas*, is
given by

gamma = (**Z**'**WZ**)^{-1}**Z**'**Wy**

**npregress** verifies that the matrix (**Z**'**WZ**) is full rank for each
observation to determine identification. Identification problems
commonly arise when the bandwidth is too small, resulting in too few
observations within a bandwidth. Independent variables that are
collinear within the bandwidth can also cause a problem with
identification at that point.

Observations that violate identification assumptions are reported as
missing for the predicted means and derivatives.

+-----------+
----+ Bandwidth +--------------------------------------------------------

**bwidth(***specs***)** specifies the half-width of the kernel at each point for
the computation of the mean and the derivatives of the mean function.
If no bandwidth is specified, one is chosen by minimizing the
integrated mean squared error of the prediction.

*specs* specifies bandwidths for the mean and derivative for each
*indepvar* in one of three ways: by specifying the name of a vector
containing the bandwidths (for example, **bwidth(H)**, where **H** is a
properly labeled vector); by specifying the equation and coefficient
names with the corresponding values (for example, **bwidth(Mean:x1=0.5**
**Effect:x1=0.9)**); or by specifying a list of values for the means,
standard errors, and derivatives for *indepvars* given in the order of
the corresponding *indepvars* and specifying the **copy** suboption (for
example, **bwidth(0.5 0.9, copy)**).

**skip** specifies that any parameters found in the specified vector that
are not also found in the model be ignored. The default action
is to issue an error message.

**copy** specifies that the list of values or the vector be copied into
the bandwidth vector by position rather than by name.

**meanbwidth(***specs***)** specifies the half-width of the kernel at each point
for the computation of the mean function. If no bandwidth is
specified, one is chosen by minimizing the integrated mean squared
error of the prediction. For details on how to specify the
bandwidth, see the description of **bwidth()**, above.

**derivbwidth(***specs***)** specifies the half-width of the kernel at each point
for the computation of the derivatives of the mean. If no bandwidth
is specified, one is chosen by minimizing the integrated mean squared
error of the prediction. For details on how to specify the
bandwidth, see the description of **bwidth()**, above.

+----+
----+ SE +---------------------------------------------------------------

**vce(***vcetype***)** specifies the type of standard error reported, which may be
either that no standard errors are reported (**none**; the default) or
that bootstrap standard errors are reported (**bootstrap**); see **[R]**
*vce_option*.

We recommend that you select the number of replications using **reps(***#***)**
instead of specifying **vce(bootstrap)**, which defaults to 50
replications. Be aware that the number of replications needed to
produce good estimates of the standard errors varies depending on the
problem.

When **vce(bootstrap)** is specified, **npregress** reports percentile
confidence intervals as recommended by Cattaneo and Jansson (2017)
instead of reporting the normal-based confidence intervals that are
reported when **vce(bootstrap)** is specified with other commands. Other
types of confidence intervals can be obtained by using the
**citype(***citype***)** option.

**reps(***#***)** specifies the number of bootstrap replications to be performed.
Specifying this option is equivalent to specifying **vce(bootstrap,**
**reps(***#***))**.

**seed(***#***)** sets the random-number seed. You must specify **reps(***#***)** with
**seed(***#***)**.

**bwreplace** computes a different bandwidth for each bootstrap replication.
The default is to compute the bandwidth once and keep it fixed for
each bootstrap replication. This option is seldom used.

+-----------+
----+ Reporting +--------------------------------------------------------

**level(***#***)**, **nocnsreport**; see **[R] estimation options**.

*display_options*: **noci**, __nopv__**alues**, __noomit__**ted**, **vsquish**, __noempty__**cells**,
__base__**levels**, __allbase__**levels**, __nofvlab__**el**, **fvwrap(***#***)**, **fvwrapon(***style***)**,
**cformat(***%fmt***)**, **pformat(%***fmt***)**, **sformat(%***fmt***)**, and **nolstretch**; see **[R]**
**estimation options**.

**citype(***citype***)** specifies the type of confidence interval to be computed.
By default, bootstrap percentile confidence intervals are reported as
recommended by Cattaneo and Jansson (2017). *citype* may be one of
**percentile**, **bc**, or **normal**.

+--------------+
----+ Maximization +-----------------------------------------------------

*maximize_options*: __iter__**ate(***#***)**, [__no__]__lo__**g**, __tr__**ace** **showstep**, __tol__**erance(***#***)**,
__ltol__**erance(***#***)**, **from(***init_specs***)**; see **[R] maximize**. These options are
seldom used.

The following option is available with **npregress** but is not shown in the
dialog box:

**coeflegend**; see **[R] estimation options**.

__Examples__

Setup
**. webuse dui**

Nonparametric regression of **citations** as a function of **fines**
**. npregress kernel citations fines**

Same as above, but specify variable names for the mean and derivatives
**. npregress kernel citations fines, predict(mean deriv)**

Use the Gaussian kernel density function
**. npregress kernel citations fines, kernel(gaussian)**

__Stored results__

**npregress** stores the following in **e()**:

Scalars
**e(N)** number of observations
**e(mean)** mean of mean function
**e(r2)** R-squared
**e(nh)** expected kernel observations
**e(converged_effect)** **1** if effect optimization converged, **0** otherwise
**e(converged_mean)** **1** if mean optimization converged, **0** otherwise
**e(converged)** **1** if effect and mean optimization converged, **0**
otherwise

Macros
**e(cmd)** **npregress**
**e(cmdline)** command as typed
**e(depvar)** name of dependent variable
**e(estimator)** **linear** or **constant**
**e(kname)** name of continuous kernel
**e(dkname)** name of discrete kernel
**e(bselector)** criterion function for bandwidth selection
**e(title)** title in estimation output
**e(vce)** *vcetype* specified in **vce()**
**e(properties)** **b** (or **b V** if **reps()** specified)
**e(datasignaturevars)** variables used in calculation of checksum
**e(datasignature)** the checksum
**e(estat_cmd)** program used to implement **estat**
**e(predict)** program used to implement **predict**
**e(marginsok)** predictions allowed by **margins**
**e(marginsprop)** signals to the **margins** command

Matrices
**e(b)** coefficient vector
**e(bwidth)** bandwidth for all predictions
**e(derivbwidth)** bandwidth for the derivative
**e(meanbwidth)** bandwidth for the mean
**e(ilog_mean)** iteration log for mean (up to 20 iterations)
**e(ilog_effect)** iteration log for effects (up to 20 iterations)

Functions
**e(sample)** marks estimation sample

__Reference__

Cattaneo, M. D., and M. Jansson. 2017. Kernel-based semiparametric
estimators: Small bandwidth asymptotics and bootstrap consistency.
Working paper.
http://eml.berkeley.edu/~mjansson/Papers/CattaneoJansson_Bootstrappin
> gSemiparametrics.pdf.