Stata 15 help for npregress

[R] npregress -- Nonparametric regression

Syntax

npregress kernel depvar indepvars [if] [in] [, options]

options Description ------------------------------------------------------------------------- Model estimator(linear|constant) use the local-linear or local-constant kernel estimator kernel(kernel) kernel density function for continuous covariates dkernel(dkernel) kernel density function for discrete covariates predict(prspec) store predicted values of the mean and derivatives using variable names specified in prspec noderivatives suppress derivative computation imaic use improved AIC instead of cross-validation to compute optimal bandwidth unidentsample(newvar) specify name of variable that marks identification problems

Bandwidth bwidth(specs) specify kernel bandwidth for all predictions meanbwidth(specs) specify kernel bandwidth for the mean derivbwidth(specs) specify kernel bandwidth for the derivatives

SE * vce(vcetype) vcetype may be none or bootstrap reps(#) equivalent to vce(bootstrap, reps(#)) seed(#) set random-number seed to #; must also specify reps(#) bwreplace vary bandwidth with each bootstrap replication; seldom used

Reporting level(#) set confidence level; default is level(95) display_options control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling citype(citype) method to compute bootstrap confidence intervals; default is citype(percentile)

Maximization maximize_options control the maximization process

coeflegend display legend instead of statistics ------------------------------------------------------------------------- indepvars may contain factor variables; see fvvarlist. bootstrap, by, and jackknife are allowed; see prefix. * vce(bootstrap) reports percentile confidence intervals instead of the normal-based confidence intervals reported when vce(bootstrap) is specified with other estimation commands. coeflegend does not appear in the dialog box. See [R] npregress postestimation for features available after estimation.

kernel Description ------------------------------------------------------------------------- epanechnikov Epanechnikov kernel function; the default epan2 alternative Epanechnikov kernel function biweight biweight kernel function cosine cosine trace kernel function gaussian Gaussian kernel function parzen Parzen kernel function rectangle rectangle kernel function triangle triangle kernel function -------------------------------------------------------------------------

dkernel Description ------------------------------------------------------------------------- liracine Li-Racine kernel function; the default cellmean cell means kernel function -------------------------------------------------------------------------

citype Description ------------------------------------------------------------------------- percentile percentile confidence intervals; the default bc bias-corrected confidence intervals normal normal-based confidence intervals -------------------------------------------------------------------------

Menu

Statistics > Nonparametric analysis > Nonparametric regression

Description

npregress performs nonparametric local-linear and local-constant kernel regression. Like linear regression, nonparametric regression models the mean of the outcome conditional on the covariates, but unlike linear regression, it makes no assumptions about the functional form of the relationship between the outcome and the covariates. npregress may be used to model the mean of a continuous, count, or binary outcome.

Options

+-------+ ----+ Model +------------------------------------------------------------

estimator(linear|constant) specifies whether the local-constant or local-linear kernel estimator should be used. The default is estimator(linear).

kernel(kernel) specifies the kernel density function for continuous covariates for use in calculating the local-constant or local-linear estimator. The default is kernel(epanechnikov).

dkernel(dkernel) specifies the kernel density function for discrete covariates for use in calculating the local-constant or local-linear estimator. The default is dkernel(liracine); see Methods and formulas in [R] npregress for details on the Li-Racine kernel. When dkernel(cellmean) is specified, discrete covariates are weighted by their cell means.

predict(prspec) specifies that npregress store the predicted values for the mean and derivatives of the mean with the specified names. prspec is the following:

predict(varlist|stub* [, replace noderivatives])

The option takes a variable list or a stub. The first variable name corresponds to the predicted outcome mean. The second name corresponds to the derivatives of the mean. There is one derivative for each indepvar.

When replace is used, variables with the names in varlist or stub* are replaced by those in the new computation. If noderivatives is specified, only a variable for the mean is created. This will increase computation speed but will add to the computation burden if you want to obtain marginal effects after estimation.

noderivatives suppresses the computation of the derivatives. In this case, only the mean function is computed.

imaic specifies to use the improved AIC instead of cross-validation to compute optimal bandwidths.

unidentsample(newvar) specifies the name of a variable that is 1 if the observation violates the model identification assumptions and is 0 otherwise. By default, this variable is a system variable (_unident_sample).

npregress computes a weighted regression for each observation in our data. An observation violates identification assumptions if the regression cannot be performed at that point. The regression formula, which is discussed in detail in Methods and formulas, is given by

gamma = (Z'WZ)^{-1}Z'Wy

npregress verifies that the matrix (Z'WZ) is full rank for each observation to determine identification. Identification problems commonly arise when the bandwidth is too small, resulting in too few observations within a bandwidth. Independent variables that are collinear within the bandwidth can also cause a problem with identification at that point.

Observations that violate identification assumptions are reported as missing for the predicted means and derivatives.

+-----------+ ----+ Bandwidth +--------------------------------------------------------

bwidth(specs) specifies the half-width of the kernel at each point for the computation of the mean and the derivatives of the mean function. If no bandwidth is specified, one is chosen by minimizing the integrated mean squared error of the prediction.

specs specifies bandwidths for the mean and derivative for each indepvar in one of three ways: by specifying the name of a vector containing the bandwidths (for example, bwidth(H), where H is a properly labeled vector); by specifying the equation and coefficient names with the corresponding values (for example, bwidth(Mean:x1=0.5 Effect:x1=0.9)); or by specifying a list of values for the means, standard errors, and derivatives for indepvars given in the order of the corresponding indepvars and specifying the copy suboption (for example, bwidth(0.5 0.9, copy)).

skip specifies that any parameters found in the specified vector that are not also found in the model be ignored. The default action is to issue an error message.

copy specifies that the list of values or the vector be copied into the bandwidth vector by position rather than by name.

meanbwidth(specs) specifies the half-width of the kernel at each point for the computation of the mean function. If no bandwidth is specified, one is chosen by minimizing the integrated mean squared error of the prediction. For details on how to specify the bandwidth, see the description of bwidth(), above.

derivbwidth(specs) specifies the half-width of the kernel at each point for the computation of the derivatives of the mean. If no bandwidth is specified, one is chosen by minimizing the integrated mean squared error of the prediction. For details on how to specify the bandwidth, see the description of bwidth(), above.

+----+ ----+ SE +---------------------------------------------------------------

vce(vcetype) specifies the type of standard error reported, which may be either that no standard errors are reported (none; the default) or that bootstrap standard errors are reported (bootstrap); see [R] vce_option.

We recommend that you select the number of replications using reps(#) instead of specifying vce(bootstrap), which defaults to 50 replications. Be aware that the number of replications needed to produce good estimates of the standard errors varies depending on the problem.

When vce(bootstrap) is specified, npregress reports percentile confidence intervals as recommended by Cattaneo and Jansson (2017) instead of reporting the normal-based confidence intervals that are reported when vce(bootstrap) is specified with other commands. Other types of confidence intervals can be obtained by using the citype(citype) option.

reps(#) specifies the number of bootstrap replications to be performed. Specifying this option is equivalent to specifying vce(bootstrap, reps(#)).

seed(#) sets the random-number seed. You must specify reps(#) with seed(#).

bwreplace computes a different bandwidth for each bootstrap replication. The default is to compute the bandwidth once and keep it fixed for each bootstrap replication. This option is seldom used.

+-----------+ ----+ Reporting +--------------------------------------------------------

level(#), nocnsreport; see [R] estimation options.

display_options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(%fmt), pformat(%fmt), sformat(%fmt), and nolstretch; see [R] estimation options.

citype(citype) specifies the type of confidence interval to be computed. By default, bootstrap percentile confidence intervals are reported as recommended by Cattaneo and Jansson (2017). citype may be one of percentile, bc, or normal.

+--------------+ ----+ Maximization +-----------------------------------------------------

maximize_options: iterate(#), [no]log, trace showstep, tolerance(#), ltolerance(#), from(init_specs); see [R] maximize. These options are seldom used.

The following option is available with npregress but is not shown in the dialog box:

coeflegend; see [R] estimation options.

Examples

Setup . webuse dui

Nonparametric regression of citations as a function of fines . npregress kernel citations fines

Same as above, but specify variable names for the mean and derivatives . npregress kernel citations fines, predict(mean deriv)

Use the Gaussian kernel density function . npregress kernel citations fines, kernel(gaussian)

Stored results

npregress stores the following in e():

Scalars e(N) number of observations e(mean) mean of mean function e(r2) R-squared e(nh) expected kernel observations e(converged_effect) 1 if effect optimization converged, 0 otherwise e(converged_mean) 1 if mean optimization converged, 0 otherwise e(converged) 1 if effect and mean optimization converged, 0 otherwise

Macros e(cmd) npregress e(cmdline) command as typed e(depvar) name of dependent variable e(estimator) linear or constant e(kname) name of continuous kernel e(dkname) name of discrete kernel e(bselector) criterion function for bandwidth selection e(title) title in estimation output e(vce) vcetype specified in vce() e(properties) b (or b V if reps() specified) e(datasignaturevars) variables used in calculation of checksum e(datasignature) the checksum e(estat_cmd) program used to implement estat e(predict) program used to implement predict e(marginsok) predictions allowed by margins e(marginsprop) signals to the margins command

Matrices e(b) coefficient vector e(bwidth) bandwidth for all predictions e(derivbwidth) bandwidth for the derivative e(meanbwidth) bandwidth for the mean e(ilog_mean) iteration log for mean (up to 20 iterations) e(ilog_effect) iteration log for effects (up to 20 iterations)

Functions e(sample) marks estimation sample

Reference

Cattaneo, M. D., and M. Jansson. 2017. Kernel-based semiparametric estimators: Small bandwidth asymptotics and bootstrap consistency. Working paper. http://eml.berkeley.edu/~mjansson/Papers/CattaneoJansson_Bootstrappin > gSemiparametrics.pdf.


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index