Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Re: Two part model vs Tobit vs Heckman

 From Jeffrey Wooldridge To statalist@hsphsun2.harvard.edu Subject Re: st: Re: Two part model vs Tobit vs Heckman Date Wed, 29 Jun 2011 11:01:07 -0400

```Hi Martin:

Hope you don't mind if I weigh in on this; it is one of the rare times
I'm looking at the listserve and might have something to contribute.

It seems common for people to confuse sample selection problems and
two-part models for corners. It appears Andrew has the latter: some
people choose not to exercise, so there is a corner at zero. Of course
a standard Tobit model is a good starting point for such a variable
(if not linear regression as more of a a data descriptive device). It
is not a sample selection problem because there is no missing data. In
your wage example, we simply do not observe wage for those out of the
work force. We're not saying the wage offer is zero; we just don't
know what it is. So the Heckman approach applied to log(wage) makes
perfect sense.

In a corner case, the issue is getting a better model for D(y|x) where
y takes on zero. Two-part models are good for that. The sort proposed
by Cragg -- truncated normal and lognormal -- do make a conditional
independence assumption, but they allow all covariates to appear in
both parts.

The Heckman-type approach applied to a corner -- what I call the
"exponential Type II Tobit model" in the second edition of my MIT
Press book -- only makes sense when log(y) is effectively treated as
the dependent variable. (This is, of course, conditional on y > 0.) It
makes no sense to apply the Heckman approach to y itself, as one can
easily get negative predictions for y.

Plus, the Heckman approach really only works with an exclusion
restricition; otherwise the correlation coefficient, rho, is not well
identified (and neither are other parameters). Is it better to make an
exclusion restriction or set rho to zero? I don't know; it's an
empirical question decided on a case-by-case basis.

In my empirical example in 2e of my book, I find that for hours
worked, the Cragg truncated normal hurdle model fits much better than
the Heckman model. Because these models are nonnested, a Vuong test
should be used. The conditional lognormal distribution is easily
rejected.

Jeff

On Wed, Jun 29, 2011 at 5:26 AM, Maarten Buis <maartenlbuis@gmail.com> wrote:
> --- Andrew Tan Khee Guan wrote me privately:
>> In a paper I'm writing on physical activity, I used the Heckman to model
>> participation likelihood and duration on physical activity. However, the
>> journal reviewer wants me to consider the Tobit model as well as the
>> Two-Part model (Probit/OLS).
> <snip>
>> However, when comparing the Two-Part model, am I right to use the following
>> Stata command?
>>
>> probit dep \$indep
>> mfx
>> reg dep \$indep if dep>0
>>
>> Is that all? But how would I be able to compare across models? While the
>> marginal effect of the probability of the Two-Part model is comparable to
>> both Heckman and Tobit, how would the single OLS equation compare to the
>> conditional and unconditionals of the former?
>>
>> Is there a test to choose between the three estimators?
>
> Such questions must be sent to the statalist rather than individual
> members. The reasons for that are explained here:
> <http://www.stata.com/support/faqs/res/statalist.html#private>
>
> The two part model is a Heckman model that assumes that there is no
> correlation between the error terms of the selection equation and the
> "wage/activity" equation. You can estimate this simply by estimating
> separate probit and linear regression models (as you already did), or
> you can use -heckman- and impose the constraint that the correlation
> between error terms is 0. I wrote a Stata tip on imposing these types
> of constraints in Stata (M.L. Buis, forthcomming). Using -heckman-
> with the constraint has the advantage that you can use exactly the
> Heckman model.
>
> The test that compares the Heckman model with the two part model is
> reported at the bottom of the output of -heckman- for your regular
> Heckman model (the line that starts with "LR test of indep. eqns. (rho
> = 0):").
>
> In the example below you can see that estimating the two models
> separately or constraining the correlation to be 0 results in the same
> parameters:
>
> *-------------- begin example ---------------
> webuse womenwk, clear
>
> gen byte sel = missing(wage)
>
> // two part model using -reg- and -probit-
> reg wage educ age
> probit sel married children educ age
>
> // two part model using -heckman-
> constraint 1 [athrho]_b[_cons] = 0
> heckman wage educ age,                    ///
>        select(married children educ age) ///
>        constraint(1)
> *--------------- end example ----------------
> (For more on examples I sent to the Statalist see:
> http://www.maartenbuis.nl/example_faq )
>
> Hope this helps,
> Maarten
>
> M.L. Buis (forthcomming) "Stata tip 97: Getting at rhos and sigmas",
> The Stata Journal, 11(2).
> <http://www.maartenbuis.nl/publications/sigma_rho.html>
>
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
>
>
> http://www.maartenbuis.nl
> --------------------------
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```