Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: heckprob using multiple imputation


From   Marcus L Britton <[email protected]>
To   [email protected]
Subject   Re: st: heckprob using multiple imputation
Date   Tue, 18 Mar 2014 09:56:34 -0500 (CDT)

I share Klaus' interest in this issue--any guidance from those more knowledgeable on this issue than I would be greatly appreciated.

Klaus, you may already be aware of this post on CrossValidated: http://stats.stackexchange.com/questions/65678/using-heckman-in-combination-with-mi-estimate-stata

But if not, perhaps it will be helpful.

Marcus Britton


------------------------------

Date: Sat, 15 Mar 2014 15:14:20 +0100
From: Klaus Nowotny <[email protected]>
Subject: st: heckprob using multiple imputation

Dear statalist users,

I want to estimate a probit model where y is a function of income and 
other explanatory variables X. However, y is only observed for a subset 
of observations where z==1, so I want to estimate a probit model with 
sample selection:

heckprob y income X, select(z = income X W)

where W is a set of variables not related to y.

My problem is that income is unobserved for about 25% of all 
observations (and about 24% of the observations where z==1), a problem I 
want to solve using multiple imputation. Now the MI literature 
recommends that all variables used in the subsequent analysis should 
also be included in the imputation model, including the dependent 
variable. But what if the dependent variable is not observed for the 
full sample? Is it okay to impute (log-)income as:

mi impute regress income = X W z?

Or would I have to impute both income and y using, for example, 
multivariate normal regression:

mi impute mvn y income = X W z?

Or, would it be better to jointly model the probability that z==1 & 
income!=. as the selection step in the probit with sample selection:

gen v=(z==1 & income!=.)
heckprob y income X, select(v = income X W)?

Even if there is no correlation between the error terms in the selection 
and outcome models (and my preliminary evidence suggests that this is 
the case), if I would impute income just for those observations where 
the dependent variable is observed, it would still be inefficient since 
it does not use all of the available data.

Any help is greatly appreciated!
Klaus
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

------------------------------


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index