Stata 11 help for heckman

help heckman dialogs: heckman_ml heckman_2step svy: heckman_ml also see: heckman postestimation -------------------------------------------------------------------------------

Title

[R] heckman -- Heckman selection model

Syntax

Basic syntax

heckman depvar [indepvars], select(varlist_s) [twostep]

or

heckman depvar [indepvars], select(depvar_s = varlist_s) [twostep]

Full syntax for maximum likelihood estimates only

heckman depvar [indepvars] [if] [in] [weight] , select([depvar_s =] varlist_s [, offset(varname) noconstant]) [ heckman_ml_options]

Full syntax for Heckman's two-step consistent estimates only

heckman depvar [indepvars] [if] [in], twostep select([depvar_s =] varlist_s [, noconstant]) [heckman_ts_options]

heckman_ml_options description ------------------------------------------------------------------------- Model * select() specify selection equation: dependent and independent variables; whether to have constant term and offset variable noconstant suppress constant term offset(varname) include varname in model with coefficient constrained to 1 constraints(constraints) apply specified linear constraints collinear keep collinear variables

SE/Robust vce(vcetype) vcetype may be oim, robust, cluster clustvar, opg, bootstrap, or jackknife

Reporting level(#) set confidence level; default is level(95) first report first-step probit estimates noskip perform likelihood-ratio test nshazard(newvar) generate nonselection hazard variable mills(newvar) synonym for nshazard() nocnsreport do not display constraints display_options control spacing and display of omitted variables and base and empty cells

Maximization maximize_options control the maximization process; seldom used

+ coeflegend display coefficients' legend instead of coefficient table ------------------------------------------------------------------------- * select() is required. The full specification is select([depvar_s =] varlist_s [, offset(varname) noconstant]) + coeflegend does not appear in the dialog box.

heckman_ts_options description ------------------------------------------------------------------------- Model * select() specify selection equation: dependent and independent variables; whether to have constant term * twostep produce two-step consistent estimate noconstant suppress constant term rhosigma truncate rho to [-1,1] with consistent Sigma rhotrunc truncate rho to [-1,1] rholimited truncate rho in limited cases rhoforce do not truncate rho

SE vce(vcetype) vcetype may be conventional, bootstrap, or jackknife

Reporting level(#) set confidence level; default is level(95) first report first-step probit estimates nshazard(newvar) generate nonselection hazard variable mills(newvar) synonym for nshazard() display_options control spacing and display of omitted variables and base and empty cells

+ coeflegend display coefficients' legend instead of coefficient table ------------------------------------------------------------------------- * select() and twostep are required. The full specification is select([depvar_s =] varlist_s [, noconstant]) + coeflegend does not appear in the dialog box.

indepvars and varlist_s may contain factor variables; see fvvarlist. depvar, indepvars, varlist_s, and depvar_s may contain time-series operators; see tsvarlist. bootstrap, by, jackknife, rolling, statsby, and svy are allowed; see prefix. Weights are not allowed with the bootstrap prefix. aweights are not allowed with the jackknife prefix. twostep, vce(), first, noskip, and weights are not allowed with the svy prefix. pweights, aweights, fweights, and iweights are allowed with maximum likelihood estimation; see weight. No weights are allowed if twostep is specified. See [R] heckman postestimation for features available after estimation.

Menu

heckman for maximum likelihood estimates

Statistics > Sample-selection models > Heckman selection model (ML)

heckman for two-step consistent estimates

Statistics > Sample-selection models > Heckman selection model (two-step)

Description

heckman fits regression models with selection by using either Heckman's two-step consistent estimator or full maximum likelihood.

Options for Heckman selection model (ML)

+-------+ ----+ Model +------------------------------------------------------------

select(...) specifies the variables and options for the selection equation. It is an integral part of specifying a Heckman model and is required. The selection equation should contain at least one variable that is not in the outcome equation.

If depvar_s is specified, it should be coded as 0 or 1, with 0 indicating an observation not selected and 1 indicating a selected observation. If depvar_s is not specified, observations for which depvar is not missing are assumed selected, and those for which depvar is missing are assumed not selected.

noconstant, offset(varname), constraints(constraints), collinear; see [R] estimation options.

+-----------+ ----+ SE/Robust +--------------------------------------------------------

vce(vcetype) specifies the type of standard error reported, which includes types that are derived from asymptotic theory, that are robust to some kinds of misspecification, that allow for intragroup correlation, and that use bootstrap or jackknife methods; see [R] vce_option.

+-----------+ ----+ Reporting +--------------------------------------------------------

level(#); see [R] estimation options.

first specifies that the first-step probit estimates of the selection equation be displayed before estimation.

noskip specifies that a full maximum-likelihood model with only a constant for the regression equation be fit. This model is not displayed but is used as the base model to compute a likelihood-ratio test for the model test statistic displayed in the estimation header. By default, the overall model test statistic is an asymptotically equivalent Wald test that all the parameters in the regression equation are zero (except the constant). For many models, this option can substantially increase estimation time.

nshazard(newvar) and mills(newvar) are synonyms; either will create a new variable containing the nonselection hazard -- what Heckman (1979) referred to as the inverse of the Mills' ratio -- from the selection equation. The nonselection hazard is computed from the estimated parameters of the selection equation.

nocnsreport; see [R] estimation options.

display_options: noomitted, vsquish, noemptycells, baselevels, allbaselevels; see [R] estimation options.

+--------------+ ----+ Maximization +-----------------------------------------------------

maximize_options: difficult, technique(algorithm_spec), iterate(#), [no]log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), nonrtolerance, from(init_specs); see [R] maximize. These options are seldom used.

Setting the optimization type to technique(bhhh) resets the default vcetype to vce(opg).

The following option is available with heckman but is not shown in the dialog box:

coeflegend; see [R] estimation options.

Options for Heckman selection model (two-step)

+-------+ ----+ Model +------------------------------------------------------------

select(...) specifies the variables and options for the selection equation. It is an integral part of specifying a Heckman model and is required. The selection equation should contain at least one variable that is not in the outcome equation.

If depvar_s is specified, it should be coded as 0 or 1, with 0 indicating an observation not selected and 1 indicating a selected observation. If depvar_s is not specified, observations for which depvar is not missing are assumed selected, and those for which depvar is missing are assumed not selected.

twostep specifies that Heckman's two-step efficient estimates of the parameters, standard errors, and covariance matrix be produced.

noconstant; see [R] estimation options.

rhosigma, rhotrunc, rholimited, and rhoforce are rarely used options to specify how the two-step estimator (option twostep) handles unusual cases in which the two-step estimate of rho is outside the admissible range for a correlation, [-1,1]. When rho is outside this range, the two-step estimate of the coefficient variance-covariance matrix may not be positive definite and thus may be unusable for testing. The default is rhosigma.

rhosigma specifies that rho be truncated, as with the rhotrunc option, and that the estimate of sigma be made consistent with rho_hat, the truncated estimate of rho. So, sigma_hat = B_m * rho_hat; see Methods and formulas in [R] heckman for the definition of B_m. Both the truncated rho and the new estimate of sigma_hat are used in all computations to estimate the two-step covariance matrix.

rhotrunc specifies that rho be truncated to lie in the range [-1,1]. If the two-step estimate is less than -1, rho is set to -1; if the two-step estimate is greater than 1, rho is set to 1. This truncated value of rho is used in all computations to estimate the two-step covariance matrix.

rholimited specifies that rho be truncated only in computing the diagonal matrix D as it enters V_twostep and Q; see Methods and formulas in [R] heckman. In all other computations, the untruncated estimate of rho is used.

rhoforce specifies that the two-step estimate of rho be retained, even if it is outside the admissible range for a correlation. This option may, in rare cases, lead to a non-positive definite covariance matrix.

These options have no effect when estimation is by maximum likelihood, the default. They also have no effect when the two-step estimate of rho is in the range [-1,1].

+----+ ----+ SE +---------------------------------------------------------------

vce(vcetype) specifies the type of standard error reported, which includes types that are derived from asymptotic theory and that use bootstrap or jackknife methods; see [R] vce_option.

vce(conventional), the default, uses the two-step variance estimator derived by Heckman.

+-----------+ ----+ Reporting +--------------------------------------------------------

level(#); see [R] estimation options.

first specifies that the first-step probit estimates of the selection equation be displayed before estimation.

nshazard(newvar) and mills(newvar) are synonyms; either will create a new variable containing the nonselection hazard -- what Heckman (1979) referred to as the inverse of the Mills' ratio -- from the selection equation. The nonselection hazard is computed from the estimated parameters of the selection equation.

display_options: noomitted, vsquish, noemptycells, baselevels, allbaselevels; see [R] estimation options.

The following option is available with heckman but is not shown in the dialog box:

coeflegend; see [R] estimation options.

Remarks

Heckman estimates all the parameters in the model:

(regression equation: y is depvar, x is varlist) y = xb + u_1

(selection equation: Z is varlist_s) y observed if Zg + u_2 > 0

where: u_1 ~ N(0, sigma) u_2 ~ N(0, 1) corr(u_1, u_2) = rho

In the syntax for heckman, depvar and varlist are the dependent variable and regressors for the underlying regression model (y = xb), and varlist_s are the variables (Z) thought to determine whether depvar is selected or observed (selected or not selected). By default, heckman assumes that missing values (see missing) of depvar imply that the dependent variable is unobserved (not selected). With some datasets, it is more convenient to specify a binary variable (depvar_s) that identifies the observations for which the dependent is observed/selected (depvar_s!=0) or not observed (depvar_s=0); heckman will accommodate either type of data.

Examples

Setup . webuse womenwk

Obtain full ML estimates . heckman wage educ age, select(married children educ age)

Obtain Heckman's two-step consistent estimates . heckman wage educ age, select(married children educ age) twostep

Define and use each equation separately . global wage_eqn wage educ age . global seleqn married children age . heckman $wage_eqn, select($seleqn)

Use a variable to identify selection . generate wageseen = (wage < .) . heckman wage educ age, select(wageseen = married children educ age)

Specify robust variance . heckman wage educ age, select(married children educ age) vce(robust)

Specify clustering on county . heckman $wage_eqn, select($seleqn) vce(cluster county)

Report first-step probit estimates . heckman wage educ age, select(married children educ age) first

Create mymills containing nonselection hazard . heckman $wage_eqn, select($seleqn) mills(mymills)

No constant in model . heckman wage educ age, noconstant select(married children educ age)

No constant in selection equation . heckman wage educ age, select(married children educ age, noconstant)

Saved results

heckman (maximum likelihood) saves the following in e():

Scalars e(N) number of observations e(N_cens) number of censored observations e(k) number of parameters e(k_eq) number of equations e(k_eq_model) number of equations in model Wald test e(k_aux) number of auxiliary parameters e(k_dv) number of dependent variables e(k_autoCns) number of base, empty, and omitted constraints e(df_m) model degrees of freedom e(ll) log likelihood e(ll_0) log likelihood, constant-only model e(N_clust) number of clusters e(lambda) lambda e(selambda) standard error of lambda e(sigma) sigma e(chi2) chi-squared e(chi2_c) chi-squared for comparison test e(p_c) p-value for comparison test e(p) significance of comparison test e(rho) rho e(rank) rank of e(V) e(rank0) rank of e(V) for constant-only model e(ic) number of iterations e(rc) return code e(converged) 1 if converged, 0 otherwise

Macros e(cmd) heckman e(cmdline) command as typed e(depvar) names of dependent variable e(wtype) weight type e(wexp) weight expression e(title) title in estimation output e(title2) secondary title in estimation output e(clustvar) name of cluster variable e(offset1) offset for regression equation e(offset2) offset for selection equation e(mills) variable containing nonselection hazard (inverse of Mills') e(chi2type) Wald or LR; type of model chi-squared test e(chi2_ct) Wald or LR; type of model chi-squared test corresponding to e(chi2_c) e(vce) vcetype specified in vce() e(vcetype) title used to label Std. Err. e(diparm#) display transformed parameter # e(opt) type of optimization e(which) max or min; whether optimizer is to perform maximization or minimization e(method) ml e(ml_method) type of ml method e(user) name of likelihood-evaluator program e(technique) maximization technique e(singularHmethod) m-marquardt or hybrid; method used when Hessian is singular e(crittype) optimization criterion e(properties) b V e(predict) program used to implement predict e(marginsok) predictions allowed by margins e(asbalanced) factor variables fvset as asbalanced e(asobserved) factor variables fvset as asobserved

Matrices e(b) coefficient vector e(Cns) constraints matrix e(ilog) iteration log (up to 20 iterations) e(gradient) gradient vector e(V) variance-covariance matrix of the estimators e(V_modelbased) model-based variance

Functions e(sample) marks estimation sample

heckman (two-step) saves the following in e():

Scalars e(N) number of observations e(N_cens) number of censored observations e(df_m) model degrees of freedom e(lambda) lambda e(selambda) standard error of lambda e(sigma) sigma e(chi2) chi-squared e(p) significance of comparison test e(rho) rho e(rank) rank of e(V)

Macros e(cmd) heckman e(cmdline) command as typed e(depvar) names of dependent variable e(title) title in estimation output e(title2) secondary title in estimation output e(mills) variable containing nonselection hazard (inverse of Mills') e(chi2type) Wald or LR; type of model chi-squared test e(vce) vcetype specified in vce() e(vcetype) title used to label Std. Err. e(rhometh) rhosigma, rhotrunc, rholimited, or rhoforce e(method) twostep e(properties) b V e(predict) program used to implement predict e(marginsok) predictions allowed by margins e(marginsnotok) predictions disallowed by margins e(asbalanced) factor variables fvset as asbalanced e(asobserved) factor variables fvset as asobserved

Matrices e(b) coefficient vector e(V) variance-covariance matrix of the estimators

Functions e(sample) marks estimation sample

Reference

Heckman, J. 1976. The common structure of statistical models of truncation, sample selection, and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement 5: 475-492.

Also see

Manual: [R] heckman

Help: [R] heckman postestimation; [R] heckprob, [R] regress, [R] tobit, [R] treatreg, [SVY] svy estimation


© Copyright 1996–2010 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index