**[TE] teffects nnmatch** -- Nearest-neighbor matching

__Syntax__

**teffects** **nnmatch** **(***ovar* *omvarlist***)** **(***tvar***)** [*if*] [*in*] [*weight*] [**,** *stat*
*options*]

*ovar* is a binary, count, continuous, fractional, or nonnegative outcome
of interest.

*omvarlist* specifies the covariates in the outcome model.

*tvar* must contain integer values representing the treatment levels. Only
two treatment levels are allowed.

*stat* Description
-------------------------------------------------------------------------
Stat
**ate** estimate average treatment effect in population;
the default
**atet** estimate average treatment effect on the treated
-------------------------------------------------------------------------

*options* Description
-------------------------------------------------------------------------
Model
__nn__**eighbor(***#***)** specify number of matches per observation;
default is **nneighbor(1)**
__bias__**adj(***varlist***)** correct for large-sample bias using specified
variables
__e__**match(***varlist***)** match exactly on specified variables

SE/Robust
**vce(***vcetype***)** *vcetype* may be
**vce(**__r__**obust** [**,** **nn(***#***)**]**)**; use robust Abadie-Imbens
standard errors with *#* matches
**vce(iid)**; use default Abadie-Imbens standard
errors

Reporting
__l__**evel(***#***)** set confidence level; default is **level(95)**
__dmv__**ariables** display names of matching variables
*display_options* control columns and column formats, row spacing,
line width, display of omitted variables and
base and empty cells, and factor-variable
labeling

Advanced
__cal__**iper(***#***)** specify the maximum distance for which two
observations are potential neighbors
__dtol__**erance(***#***)** set maximum distance between individuals
considered equal
__os__**ample(***newvar***)** *newvar* identifies observations that violate the
overlap assumption
__con__**trol(***# *|* label***)** specify the level of *tvar* that is the control
__tle__**vel(***# *|* label***)** specify the level of *tvar* that is the treatment
__gen__**erate(***stub***)** generate variables containing the observation
numbers of the nearest neighbors
__m__**etric(***metric***)** select distance metric for covariates

__coefl__**egend** display legend instead of statistics
-------------------------------------------------------------------------

*metric* Description
-------------------------------------------------------------------------
__maha__**lanobis** inverse sample covariate covariance; the default
__ivar__**iance** inverse diagonal sample covariate covariance
__eucl__**idean** identity
__mat__**rix** *matname* user-supplied scaling matrix
-------------------------------------------------------------------------

*omvarlist* may contain factor variables; see fvvarlists.
**by** and **statsby** are allowed; see prefix.
**fweight**s are allowed; see weight.
**coeflegend** does not appear in the dialog box.
See **[TE] teffects postestimation** for features available after estimation.

__Menu__

**Statistics > Treatment effects > Continuous outcomes >** **Nearest-neighbor**
**matching**

**Statistics > Treatment effects > Binary outcomes >** **Nearest-neighbor**
**matching**

**Statistics > Treatment effects > Count outcomes >** **Nearest-neighbor**
**matching**

**Statistics > Treatment effects > Fractional outcomes >** **Nearest-neighbor**
**matching**

**Statistics > Treatment effects > Nonnegative outcomes >** **Nearest-neighbor**
**matching**

__Description__

**teffects** **nnmatch** estimates the average treatment effect and average
treatment effect on the treated from observational data by
nearest-neighbor matching. Nearest-neighbor matching estimators impute
the missing potential outcome for each subject by using an average of the
outcomes of similar subjects that receive the other treatment level.
Similarity between subjects is based on a weighted function of the
covariates for each observation. The treatment effect is computed by
taking the average of the difference between the observed and imputed
potential outcomes for each subject. **teffects nnmatch** accepts a
continuous, binary, count, fractional, or nonnegative outcome.

See **[TE] teffects intro** or **[TE] teffects intro advanced** for more
information about estimating treatment effects from observational data.

__Options__

+-------+
----+ Model +------------------------------------------------------------

**nneighbor(***#***)** specifies the number of matches per observation. The default
is **nneighbor(1)**. Each observation is matched with at least the
specified number of observations from the other treatment level.
**nneighbor()** must specify an integer greater than or equal to 1 but no
larger than the number of observations in the smallest treatment
group.

**biasadj(***varlist***)** specifies that a linear function of the specified
covariates be used to correct for a large-sample bias that exists
when matching on more than one continuous covariate. By default, no
correction is performed.

Abadie and Imbens (2006, 2011) show that nearest-neighbor matching
estimators are not consistent when matching on two or more continuous
covariates and propose a bias-corrected estimator that is consistent.
The correction term uses a linear function of variables specified in
**biasadj()**; see example 3.

**ematch(***varlist***)** specifies that the variables in *varlist* match exactly.
All variables in *varlist* must be numeric and may be specified as
factors. **teffects** **nnmatch** exits with an error if any observations do
not have the requested exact match.

+------+
----+ Stat +-------------------------------------------------------------

*stat* is one of two statistics: **ate** or **atet**. **ate** is the default.

**ate** specifies that the average treatment effect be estimated.

**atet** specifies that the average treatment effect on the treated be
estimated.

+-----------+
----+ SE/Robust +--------------------------------------------------------

**vce(***vcetype***)** specifies the standard errors that are reported. By
default, **teffects** **nnmatch** uses two matches in estimating the robust
standard errors.

**vce(robust** [**,** **nn(***#***)**]**)** specifies that robust standard errors be
reported and that the requested number of matches be used optionally.

**vce(iid)** specifies that standard errors for independently and
identically distributed data be reported.

The standard derivative-based standard-error estimators cannot be
used by **teffects** **nnmatch**, because these matching estimators are not
differentiable. The implemented methods were derived by Abadie and
Imbens (2006, 2011, 2012); see *Methods and formulas*.

As discussed in Abadie and Imbens (2008), bootstrap estimators do not
provide reliable standard errors for the estimator implemented by
**teffects** **nnmatch**.

+-----------+
----+ Reporting +--------------------------------------------------------

**level(***#***)**; see **[R] estimation options**.

**dmvariables** specifies that the matching variables be displayed.

*display_options*: **noci**, __nopv__**alues**, __noomit__**ted**, **vsquish**, __noempty__**cells**,
__base__**levels**, __allbase__**levels**, __nofvlab__**el**, **fvwrap(***#***)**, **fvwrapon(***style***)**,
**cformat(***%fmt***)**, **pformat(%***fmt***)**, **sformat(%***fmt***)**, and **nolstretch**; see **[R]**
**estimation options**.

+----------+
----+ Advanced +---------------------------------------------------------

**caliper(***#***)** specifies the maximum distance at which two observations are a
potential match. By default, all observations are potential matches
regardless of how dissimilar they are.

The distance is based on *omvarlist*. If an observation does not have
at least **nneighbor(***#***)** matches, **teffects** **nnmatch** exits with an error
message. Use option **osample(***newvar***)** to identify all observations
that are deficient in matches.

**dtolerance(***#***)** specifies the tolerance used to determine exact matches.
The default value is **dtolerance(sqrt(c(epsdouble)))**.

Integer-valued variables are usually used for exact matching. The
**dtolerance()** option is useful when continuous variables are used for
exact matching.

**osample(***newvar***)** specifies that indicator variable *newvar* be created to
identify observations that violate the overlap assumption. This
variable will identify all observations that do not have at least
**nneighbor(***#***)** matches in the opposite treatment group within
**caliper(***#***)** (for **metric()** distance matching) or **dtolerance(***#***)** (for
**ematch(***varlist***)** exact matches).

The **vce(robust, nn(***#***))** option also requires at least *#* matches in the
same treatment group within the distance specified by **caliper(***#***)** or
within the exact matches specified by **dtolerance(***#***)**.

The average treatment effect on the treated, option **atet**, using
**vce(iid)** requires only **nneighbor(***#***)** control group matches for the
treated group.

**control(***# *|* label***)** specifies the level of *tvar* that is the control. The
default is the first treatment level. You may specify the numeric
level *#* (a nonnegative integer) or the label associated with the
numeric level. **control()** and **tlevel()** may not specify the same
treatment level.

**tlevel(***# *|* label***)** specifies the level of *tvar* that is the treatment for
the statistic **atet**. The default is the second treatment level. You
may specify the numeric level *#* (a nonnegative integer) or the label
associated with the numeric level. **tlevel()** may only be specified
with statistic **atet**. **tlevel()** and **control()** may not specify the same
treatment level.

**generate(***stub***)** specifies that the observation numbers of the nearest
neighbors be stored in the new variables *stub***1**, *stub***2**, .... This
option is required if you wish to perform postestimation based on the
matching results. The number of variables generated may be more than
**nneighbors(***#***)** because of tied distances. These variables may not
already exist.

**metric(***metric***)** specifies the distance matrix used as the weight matrix in
a quadratic form that transforms the multiple distances into a single
distance measure; see *Nearest-neighbor matching estimator* in *Methods*
*and formulas* of **[TE] teffects nnmatch** for details.

The following option is available with **teffects** **nnmatch** but is not shown
in the dialog box:

**coeflegend**; see **[R] estimation options**.

__Examples__

Setup
**. webuse cattaneo2**

Estimate the average treatment effect of **mbsmoke** on **bweight**
**. teffects nnmatch (bweight mage prenatal1 mmarried fbaby)** **(mbsmoke)**

Refit the above model, but require exact matches on the binary variables
**. teffects nnmatch (bweight mage) (mbsmoke),** **ematch(prenatal1**
**mmarried fbaby) metric(euclidean)**

Match on two continuous variables, **mage** and **fage**, and use the
bias-adjusted estimator
**. teffects nnmatch (bweight mage fage) (mbsmoke),** **ematch(prenatal1**
**mmarried fbaby) biasadj(mage fage)**

__Video example__

Treatment effects in Stata: Nearest-neighbor matching

__Stored results__

**teffects** **nnmatch** stores the following in **e()**:

Scalars
**e(N)** number of observations
**e(n***j***)** number of observations for treatment level *j*
**e(k_levels)** number of levels in treatment variable
**e(treated)** level of treatment variable defined as treated
**e(control)** level of treatment variable defined as control
**e(k_nneighbor)** requested number of matches
**e(k_nnmin)** minimum number of matches
**e(k_nnmax)** maximum number of matches
**e(k_robust)** matches for robust VCE

Macros
**e(cmd)** **teffects**
**e(cmdline)** command as typed
**e(depvar)** name of outcome variable
**e(tvar)** name of treatment variable
**e(emvarlist)** exact match variables
**e(bavarlist)** variables used in bias adjustment
**e(mvarlist)** match variables
**e(subcmd)** **nnmatch**
**e(metric)** **mahalanobis**, **ivariance**, **euclidean**, or **matrix**
*matname*
**e(stat)** statistic estimated, **ate** or **atet**
**e(wtype)** weight type
**e(wexp)** weight expression
**e(title)** title in estimation output
**e(tlevels)** levels of treatment variable
**e(vce)** *vcetype* specified in **vce()**
**e(vcetype)** title used to label Std. Err.
**e(datasignature)** the checksum
**e(datasignaturevars)** variables used in calculation of checksum
**e(properties)** **b V**
**e(estat_cmd)** program used to implement **estat**
**e(predict)** program used to implement **predict**
**e(marginsnotok)** predictions disallowed by **margins**

Matrices
**e(b)** coefficient vector
**e(V)** variance-covariance matrix of the estimators

Functions
**e(sample)** marks estimation sample

__References__

Abadie, A., and G. W. Imbens. 2006. Large sample properties of matching
estimators for average treatment effects. *Econometrica* 74: 235-267.

--------. 2008. On the failure of the bootstrap for matching estimators.
*Econometrica* 76: 1537-1557.

------. 2011. Bias-corrected matching estimators for average treatment
effects. *Journal of Business and Economic Statistics* 29: 1-11.

------. 2012. Matching on the estimated propensity score. Harvard
University and National Bureau of Economic Research.
http://www.hks.harvard.edu/fs/aabadie/pscore.pdf.