Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Heckman comand sample issue


From   "Mark Schaffer" <[email protected]>
To   [email protected]
Subject   Re: st: Heckman comand sample issue
Date   Tue, 20 Apr 2004 16:16:44 +0100

Nicolas,

From:           	"Nicolas Theopold" <[email protected]>
To:             	<[email protected]>
Subject:        	st: Heckman comand sample issue
Date sent:      	Tue, 20 Apr 2004 10:45:53 +0100
Send reply to:  	[email protected]

> Dear Statalist users,
> 
> please excuse if this is a basic question, however, I could not find the
> answer or the right search terms to find it in the archives.
> I have a data set with about 600,000 observations, of which about 90,000
> are wage earners. In it, I would like to run the heckman command to
> correct for sample selectivity bias. My earnings variable is ln(wages),
> and thus has missing values for all observations that do not have a
> recorded wage.

I don't see why this should be a problem.  According to -help heckman-

>     By default, heckman will assume that missing values (see help
>     missing) of depvar imply that the dependent variable is unobserved
>     (not selected).  With some datasets it is more convenient to
>     specify a binary variable (depvar_s) that identifies the
>     observations for which the dependent is observed/selected
>     (depvar_s!=0) or not observed (depvar_s==0); heckman will
>     accommodate either type of data.

so you shouldn't be having any problems.  -heckman- should understand 
that in your case, a missing value for ln(wages) means "not 
selected".  Or am I missing something?

--Mark

> Hence, if I run my heckman with:
> 
> Heckman ln_wages varlist1, sel(ilf = varlist1 varlist2), 
> 
> where varlist2 are my exclusion restrictions and ilf is a participation
> dummy, the number of observations comes down to 90,000 (ilf does not
> only take the value of 1, since I failed to clean the data completely,
> yet).
> My aim is to run the selection function on the whole sample (of 600,000
> obs), and to get stata to run the wage function without the missing
> observations. The only solution I found was to replace all missing
> values with -9999 in ln_wages.
> 
> Is there some better way to tell stata to use the whole sample for the
> selection function, even if ln_wages is missing?
> 
> Thank you very much in advance for your help.
> 
> Nicolas
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


Prof. Mark E. Schaffer
Director
Centre for Economic Reform and Transformation
Department of Economics
School of Management & Languages
Heriot-Watt University, Edinburgh EH14 4AS  UK
44-131-451-3494 direct
44-131-451-3008 fax
44-131-451-3485 CERT administrator
http://www.som.hw.ac.uk/cert
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index