# st: Re: Interpretation of OLS coeff after Heckman selection

 From "Scott Merryman" To Subject st: Re: Interpretation of OLS coeff after Heckman selection Date Fri, 29 Aug 2003 08:54:00 -0500

```----- Original Message -----
From: <Christer.Thrane@hil.no>
To: <statalist@hsphsun2.harvard.edu>
Sent: Friday, August 29, 2003 5:05 AM
Subject: st: Interpretation of OLS coeff after Heckman selection

> Hi everyone,
>
> My dependent variable, Y, is the log of expenditures and a set of dummies
> (X1, X2, ...) are the explanatory variables of main concern. I also have a
> bunch of controls.
>
> Since sample selection is a problem, I use the Heckman command. (Tobit does
> not work with these data.)
>
> Recently someone pointed out to me the following: One cannot interpret the
> OLS coefficients for X1, X2, ... in the consumption equation the usual way
> (here: as semilogarithmic coefficients that need the adjustment suggested
> by Halvorsen and Palmquist [1980]) WHEN X1, X2, ... also are included as
> explanatory variables in the (probit) selection equation (which they are in
> my case). In this case, the OLS coefficients in the consumption needs to be
> adjusted according to som kind of formula....
>
> Is this true? If yes, has anyone seen such a formula? Finally, has anyone
> written a command or a ado/do file to perform this adjustment in Stata?
>
> Thanks for any help!
>
> Christer
>

Yes, it is true.  The marginal effect on Y is composed of the effect on the
selection equation and the outcome equation.  (See Greene's Econometric
Analysis)

I believe the correct procedure is as follows:

If the outcome coefficient is beta and the selection coefficient is alpha, then

dE[y| z*>0]/dx = beta - (alpha*rho*simga*delta(alpha))

where delta(alpha) = inverse Mills' ratio *(inverse Mills' ratio *  selection
prediction)

Example

. use http://www.stata-press.com/data/r8/womenwk.dta

. heckman wage educ age, select(married children educ age) mills(mills)

Iteration 0:   log likelihood = -5178.7009
Iteration 1:   log likelihood = -5178.3049
Iteration 2:   log likelihood = -5178.3045

Heckman selection model                         Number of obs      =      2000
(regression model with sample selection)        Censored obs       =       657
Uncensored obs     =      1343

Wald chi2(2)       =    508.44
Log likelihood = -5178.304                      Prob > chi2        =    0.0000

------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
wage         |
education |   .9899537   .0532565    18.59   0.000     .8855729    1.094334
age |   .2131294   .0206031    10.34   0.000     .1727481    .2535108
_cons |   .4857752   1.077037     0.45   0.652    -1.625179     2.59673
-------------+----------------------------------------------------------------
select       |
married |   .4451721   .0673954     6.61   0.000     .3130794    .5772647
children |   .4387068   .0277828    15.79   0.000     .3842534    .4931601
education |   .0557318   .0107349     5.19   0.000     .0346917    .0767718
age |   .0365098   .0041533     8.79   0.000     .0283694    .0446502
_cons |  -2.491015   .1893402   -13.16   0.000    -2.862115   -2.119915
-------------+----------------------------------------------------------------
/athrho |   .8742086   .1014225     8.62   0.000     .6754241    1.072993
/lnsigma |   1.792559    .027598    64.95   0.000     1.738468     1.84665
-------------+----------------------------------------------------------------
rho |   .7035061   .0512264                      .5885365    .7905862
sigma |   6.004797   .1657202                       5.68862    6.338548
lambda |   4.224412   .3992265                      3.441942    5.006881
------------------------------------------------------------------------------
LR test of indep. eqns. (rho = 0):   chi2(1) =    61.20   Prob > chi2 = 0.0000
------------------------------------------------------------------------------

. predict select_xb , xbs

. gen delta = mills*(mills + select_xb)

. gen b_age = [wage]_b[age] - ([select]_b[age]*e(rho)*e(sigma)*delta)

. ci b

Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
-------------+---------------------------------------------------------------
b_age |       2000    .1391227    .0006604        .1378276    .1404179

Hope this helps,
Scott

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```