How are estimates of rho outside the bounds [−1,1] handled in the
two-step Heckman estimator?
Computation of rho, disturbance correlation, in the two-step Heckman estimator.
Vince Wiggins, StataCorp
This is a technical FAQ. Technical FAQs address specific issues or
computational aspects of estimators or other commands. They are typically
written for someone who already knows and fully understands the statistics of
the command—an expert in the area. All the materials in the Stata
manual, including the Methods and Formulas and sometimes the references, are
assumed to be understood.
These FAQs often deal with issues that are not considered or adequately
addressed in the literature, and, as such, we welcome insights from readers or
related citations that we may have missed.
Stata’s maximum likelihood estimator for a regression model with selection
constrains the estimated correlation among the regression and selection
equation to be in the admissible range of a correlation, [−1,1]. This
estimation is performed by heckman, and the command allows
estimation of the same model using Heckman’s (1979) two-step estimator with
associated variance–covariance matrix (VCE). One issue with the two-step
estimator is that it can produce estimates of rho that lie outside the range
[−1,1] and this can, in rare cases, lead to an estimated VCE that is not
positive definite and may even have negative elements on the diagonal.
Greene (1981) notes this possibility but offers no solution. Stata provides
four ways of handling two-step estimates of rho outside the admissible range.
These are options that work with heckman ..., twostep. From the
heckman help file:
- rhosigma, rhotrunc, rholimited and
- are rarely used options to specify how the two-step estimator
(option twostep) handles unusual cases in which the two-step estimate of rho is
outside the admissible range for a correlation, [−1,1]. When rho is outside
this range, the two-step estimate of the coefficient
variance–covariance matrix may not be nonpositive-definite and thus
may be unusable for testing. The default is rhosigma.
- rhotrunc specifies that rho be truncated
to lie in the range [−1,1]. If the two-step estimate is
less than −1, rho is set to −1; if the two-step estimate is above 1, rho is set
to 1. This truncated value of rho is used in all computations to estimate
the two-step covariance matrix.
- rhosigma specifies that rho be truncated, as with
option rhotrunc, and that the estimate of sigma be made consistent
with rho_hat, the truncated estimate of rho. So, sigma_hat = B_m * rho_hat;
see the Methods and Formulas section of [R] heckman for the
definition of B_m. Both the truncated rho and the new estimate of sigma_hat
are used in all computations to estimate the two-step covariance
- rholimited specifies that rho be truncated only in
the computation of diagonal matrix D as it enters V_twostep and Q; see
[R] heckman Methods and Formulas. In all other computations the
untruncated estimate of rho is used.
- rhoforce specifies that the two-step estimate of rho be
retained even if it is outside the admissible range for a correlation. This
may, in rare cases, lead to a nonpositive-definite covariance
- These options have no effect when estimation is by maximum
likelihood, the default. They also have no effect when the two-step
estimate of rho is in the range [−1,1].
Other than method rhoforce, these are ad hoc methods of imposing the
constraint that a correlation must be between −1 and 1. Asymptotically, this
will always be true, but the standard two-step estimator does not impose the
constraint. With method rhoforce one accepts the two-step estimate of
rho and simply hopes not to get a nonpositive-definite VCE. In the rare case
when the VCE is not positive definite, the VCE is set to 0 and tests are
Method rhosigma was chosen as the default for heckman, twostep
based on the coverage probabilities for the null hypothesis from simulations
with disturbance variances and sample sizes likely to generate estimates of
rho outside the admissible range. The do-file that performs these simulations
can be obtained by
The parameters controlling number of observations, expected censoring rate, rho,
and number of simulation repetitions are set by global macros at the top of
the file. The model for the simulations has three covariates in the regression
equation, and two covariates in the selection equation. Some of these
covariates are correlated both within and across equations; see the do-file
for how the data are simulated.
Seven simulations were performed:
The full set of simulation results can be viewed by
With data that are generated from the selection model, we rarely encounter a
VCE that is not positive definite (recall this is possible only with the
). In the simulations, this occurred only for the very
small dataset with 50 observations and an expected 20 observations that were
not censored. Still, this was with "true" Heckman data, and we might be
more likely to observe nonpositive-definite VCEs in real data.
With very small samples, the methods rhotrunc
perform somewhat better than rhoforce
, but once we get
beyond the information content of 100 observations in our simulated model,
there is little to choose from among any of the two-step methods. Whereas there
is little difference between rhotrunc
; where there is a difference at all, method rhosigma
consistently had coverage rates closest to nominal.
One surprising result was the relative performance of the two-step estimator
and MLE on small samples. The MLE produced coverage rates worse than any of
the two-step estimates when samples were small. Coverage rates for both
two-step and MLE indicate standard errors are anticonservative for small
samples. The two-step estimates, however, are much closer to nominal than the
MLEs and for some methods of computing rho approach nominal coverage with as
few as 100 total observations and 40 expected uncensored observations.
Coverage rates for the MLEs do not approach nominal levels for the models
simulated until 200 or 300 observations are available.
We note that the MLE gives a substantial hint when there may be coverage
problems. In these cases it produces estimates of rho that are −1 or 1 and
often has difficulty converging. Those are clear hints that the estimation
data sample may not follow the assumptions of the model or that we have
insufficient information for the asymptotic properties of MLE. With small
samples, this may be fairly common. What is surprising is the coverage
rates for the two-step estimator of the VCE produce good coverage rates even
in these cases.
- Greene, W. H. 1993.
- Sample selection bias as a specification error:
comment. Econometrica 49: 3.
Also see the references for [R] heckman