Home  /  Resources & support  /  FAQs  /  Computation of rho in the two-step Heckman estimator

How are estimates of rho outside the bounds [−1,1] handled in the two-step Heckman estimator?

Title   Computation of rho, disturbance correlation, in the two-step Heckman estimator.
Technical FAQ
Author Vince Wiggins, StataCorp

Technical FAQ:   This is a technical FAQ. Technical FAQs address specific issues or computational aspects of estimators or other commands. They are typically written for someone who already knows and fully understands the statistics of the command—an expert in the area. All the materials in the Stata manual, including the Methods and Formulas and sometimes the references, are assumed to be understood.

These FAQs often deal with issues that are not considered or adequately addressed in the literature, and, as such, we welcome insights from readers or related citations that we may have missed.

Stata’s maximum likelihood estimator for a regression model with selection constrains the estimated correlation among the regression and selection equation to be in the admissible range of a correlation, [−1,1]. This estimation is performed by heckman, and the command allows estimation of the same model using Heckman’s (1979) two-step estimator with associated variance–covariance matrix (VCE). One issue with the two-step estimator is that it can produce estimates of rho that lie outside the range [−1,1] and this can, in rare cases, lead to an estimated VCE that is not positive definite and may even have negative elements on the diagonal.

Greene (1981) notes this possibility but offers no solution. Stata provides four ways of handling two-step estimates of rho outside the admissible range. These are options that work with heckman ..., twostep. From the heckman help file:

rhosigma, rhotrunc, rholimited and rhoforce are rarely used options to specify how the two-step estimator (option twostep) handles unusual cases in which the two-step estimate of rho is outside the admissible range for a correlation, [−1,1]. When rho is outside this range, the two-step estimate of the coefficient variance–covariance matrix may not be nonpositive-definite and thus may be unusable for testing. The default is rhosigma.

rhotrunc specifies that rho be truncated to lie in the range [−1,1]. If the two-step estimate is less than −1, rho is set to −1; if the two-step estimate is above 1, rho is set to 1. This truncated value of rho is used in all computations to estimate the two-step covariance matrix.

rhosigma specifies that rho be truncated, as with option rhotrunc, and that the estimate of sigma be made consistent with rho_hat, the truncated estimate of rho. So, sigma_hat = B_m * rho_hat; see the Methods and Formulas section of [R] heckman for the definition of B_m. Both the truncated rho and the new estimate of sigma_hat are used in all computations to estimate the two-step covariance matrix.

rholimited specifies that rho be truncated only in the computation of diagonal matrix D as it enters V_twostep and Q; see [R] heckman Methods and Formulas. In all other computations the untruncated estimate of rho is used.

rhoforce specifies that the two-step estimate of rho be retained even if it is outside the admissible range for a correlation. This may, in rare cases, lead to a nonpositive-definite covariance matrix.

These options have no effect when estimation is by maximum likelihood, the default. They also have no effect when the two-step estimate of rho is in the range [−1,1].

Other than method rhoforce, these are ad hoc methods of imposing the constraint that a correlation must be between −1 and 1. Asymptotically, this will always be true, but the standard two-step estimator does not impose the constraint. With method rhoforce one accepts the two-step estimate of rho and simply hopes not to get a nonpositive-definite VCE. In the rare case when the VCE is not positive definite, the VCE is set to 0 and tests are disallowed.

Method rhosigma was chosen as the default for heckman, twostep based on the coverage probabilities for the null hypothesis from simulations with disturbance variances and sample sizes likely to generate estimates of rho outside the admissible range. The do-file that performs these simulations can be obtained by clicking here. The parameters controlling number of observations, expected censoring rate, rho, and number of simulation repetitions are set by global macros at the top of the file. The model for the simulations has three covariates in the regression equation, and two covariates in the selection equation. Some of these covariates are correlated both within and across equations; see the do-file for how the data are simulated.

Seven simulations were performed:

Simulation N Rho Selection
probability
Simulation
repetitions
1 50 0.80 .4 1000
2 100 −.85 .4 1000
3 100 0.40 .8 1000
4 200 0.60 .6 1000
5 300 0.60 .6 1000
6 1000 0.60 .4 1000
7 1000 0.40 .8 1000


The full set of simulation results can be viewed by clicking here.

With data that are generated from the selection model, we rarely encounter a VCE that is not positive definite (recall this is possible only with the option rhotrunc). In the simulations, this occurred only for the very small dataset with 50 observations and an expected 20 observations that were not censored. Still, this was with "true" Heckman data, and we might be more likely to observe nonpositive-definite VCEs in real data.

With very small samples, the methods rhotrunc, rholimited, and rhosigma perform somewhat better than rhoforce, but once we get beyond the information content of 100 observations in our simulated model, there is little to choose from among any of the two-step methods. Whereas there is little difference between rhotrunc, rholimited, and rhosigma; where there is a difference at all, method rhosigma consistently had coverage rates closest to nominal.

One surprising result was the relative performance of the two-step estimator and MLE on small samples. The MLE produced coverage rates worse than any of the two-step estimates when samples were small. Coverage rates for both two-step and MLE indicate standard errors are anticonservative for small samples. The two-step estimates, however, are much closer to nominal than the MLEs and for some methods of computing rho approach nominal coverage with as few as 100 total observations and 40 expected uncensored observations. Coverage rates for the MLEs do not approach nominal levels for the models simulated until 200 or 300 observations are available.

We note that the MLE gives a substantial hint when there may be coverage problems. In these cases it produces estimates of rho that are −1 or 1 and often has difficulty converging. Those are clear hints that the estimation data sample may not follow the assumptions of the model or that we have insufficient information for the asymptotic properties of MLE. With small samples, this may be fairly common. What is surprising is the coverage rates for the two-step estimator of the VCE produce good coverage rates even in these cases.

Reference

Greene, W. H. 1993.
Sample selection bias as a specification error: comment. Econometrica 49: 3.
Also see the references for [R] heckman.