[R] heckman postestimation -- Postestimation tools for heckman
Postestimation commands
The following postestimation commands are available after heckman:
Command Description
-------------------------------------------------------------------------
contrast contrasts and ANOVA-style joint tests of estimates
* estat ic Akaike's and Schwarz's Bayesian information criteria
(AIC and BIC)
estat summarize summary statistics for the estimation sample
estat vce variance-covariance matrix of the estimators (VCE)
estat (svy) postestimation statistics for survey data
estimates cataloging estimation results
+ hausman Hausman's specification test
lincom point estimates, standard errors, testing, and
inference for linear combinations of coefficients
+ lrtest likelihood-ratio test; not available with two-step
estimator
margins marginal means, predictive margins, marginal effects,
and average marginal effects
marginsplot graph the results from margins (profile plots,
interaction plots, etc.)
nlcom point estimates, standard errors, testing, and
inference for nonlinear combinations of coefficients
predict predictions, residuals, influence statistics, and
other diagnostic measures
predictnl point estimates, standard errors, testing, and
inference for generalized predictions
pwcompare pairwise comparisons of estimates
* suest seemingly unrelated estimation
test Wald tests of simple and composite linear hypotheses
testnl Wald tests of nonlinear hypotheses
-------------------------------------------------------------------------
* estat ic and suest are not appropriate after heckman, twostep.
+ hausman and lrtest are not appropriate with svy estimation results.
Syntax for predict
After ML or twostep
predict [type] newvar [if] [in] [, statistic nooffset]
After ML
predict [type] {stub*|newvar_reg newvar_sel newvar_athrho
newvar_lnsigma} [if] [in] , scores
statistic Description
-------------------------------------------------------------------------
Main
xb linear prediction; the default
stdp standard error of the prediction
stdf standard error of the forecast
xbsel linear prediction for selection equation
stdpsel standard error of the linear prediction for
selection equation
pr(a,b) Pr(y | a < y < b)
e(a,b) E(y | a < y < b)
ystar(a,b) E(y*), y* = max{a,min(y,b)}
ycond E(y | y observed)
yexpected E(y*), y taken to be 0 where unobserved
nshazard or mills nonselection hazard (also called inverse of
Mills's ratio)
psel Pr(y observed)
-------------------------------------------------------------------------
These statistics are available both in and out of sample; type predict
... if e(sample) ... if wanted only for the estimation sample.
stdf is not allowed with svy estimation results.
where a and b may be numbers or variables; a missing (a > .) means minus
infinity, and b missing (b > .) means plus infinity; see missing.
Menu for predict
Statistics > Postestimation
Description for predict
predict creates a new variable containing predictions such as linear
predictions, standard errors, probabilities, expected values, and
nonselection hazards.
Options for predict
+------+
----+ Main +-------------------------------------------------------------
xb, the default, calculates the linear prediction.
stdp calculates the standard error of the prediction, which can be
thought of as the standard error of the predicted expected value or
mean for the observation's covariate pattern. The standard error of
the prediction is also referred to as the standard error of the
fitted value.
stdf calculates the standard error of the forecast, which is the standard
error of the point prediction for 1 observation. It is commonly
referred to as the standard error of the future or forecast value.
By construction, the standard errors produced by stdf are always
larger than those produced by stdp; see Methods and formulas in [R]
regress postestimation.
xbsel calculates the linear prediction for the selection equation.
stdpsel calculates the standard error of the linear prediction for the
selection equation.
pr(a,b) calculates Pr(a < xb + u < b), the probability that y|x would be
observed in the interval (a,b).
a and b may be specified as numbers or variable names; lb and ub are
variable names;
pr(20,30) calculates Pr(20 < xb + u < 30);
pr(lb,ub) calculates Pr(lb < xb + u < ub); and
pr(20,ub) calculates Pr(20 < xb + u < ub).
a missing (a > .) means minus infinity; pr(.,30) calculates
Pr(xb + u < 30);
pr(lb,30) calculates Pr(xb + u < 30) in observations for which lb > .
and calculates Pr(lb < xb + u < 30) elsewhere.
b missing (b > .) means plus infinity; pr(20,.) calculates
Pr(xb + u > 20);
pr(20,ub) calculates Pr(xb + u > 20) in observations for which ub > .
and calculates Pr(20 < xb + u < ub) elsewhere.
e(a,b) calculates E(xb + u | a < xb + u < b), the expected value of y|x
conditional on y|x being in the interval (a,b), meaning that y|x is
truncated. a and b are specified as they are for pr().
ystar(a,b) calculates E(y*), where y* = a if xb + u < a, y* = b if
xb + u > b, and y* = xb + u otherwise, meaning that y* is not
selected. a and b are specified as they are for pr().
ycond calculates the expected value of the dependent variable conditional
on the dependent variable being observed, that is, selected.
yexpected calculates the expected value of the dependent variable (y*),
where that value is taken to be 0 when it is expected to be
unobserved.
The assumption of 0 is valid for many cases where nonselection
implies nonparticipation (for example, unobserved wage levels,
insurance claims from those who are uninsured) but may be
inappropriate for some problems (for example, unobserved disease
incidence).
nshazard and mills are synonyms; both calculate the nonselection hazard
-- what Heckman (1979) referred to as the inverse of the Mills ratio
-- from the selection equation.
psel calculates the probability of selection (or being observed).
nooffset is relevant when you specify offset(varname) for heckman. It
modifies the calculations made by predict so that they ignore the
offset variable; the linear prediction is treated as xb rather than
as xb + offset.
scores, not available with twostep, calculates equation-level score
variables.
The first new variable will contain the derivative of the log
likelihood with respect to the regression equation.
The second new variable will contain the derivative of the log
likelihood with respect to the selection equation.
The third new variable will contain the derivative of the log
likelihood with respect to the third equation (athrho).
The fourth new variable will contain the derivative of the log
likelihood with respect to the fourth equation (lnsigma).
Syntax for margins
margins [marginlist] [, options]
margins [marginlist] , predict(statistic ...) [predict(statistic ...)
...] [options]
statistic Description
-------------------------------------------------------------------------
xb linear prediction; the default
xbsel linear prediction for selection equation
pr(a,b) Pr(y | a < y < b)
e(a,b) E(y | a < y < b)
ystar(a,b) E(y*), y* = max{a,min(y,b)}
* ycond E(y | y observed)
* yexpected E(y*), y taken to be 0 where unobserved
nshazard or mills nonselection hazard (also called inverse of Mills's
ratio)
psel Pr(y observed)
stdp not allowed with margins
stdf not allowed with margins
stdpsel not allowed with margins
-------------------------------------------------------------------------
* ycond and yexpected are not allowed with margins after heckman,
twostep.
Statistics not allowed with margins are functions of stochastic
quantities other than e(b).
For the full syntax, see [R] margins.
Menu for margins
Statistics > Postestimation
Description for margins
margins estimates margins of response for linear predictions,
probabilities, expected values, and nonselection hazards.
Examples
Setup
. webuse womenwk
. heckman wage educ age, select(married children educ age)
Predicted wage conditional on it being observed
. predict ycond, ycond
Probability of wage being observed
. predict probseen, psel
Reference
Heckman, J. 1979. Sample selection bias as a specification error.
Econometrica 47: 153-161.