Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Missing data on outcome and sample selection bias |

Date |
Mon, 1 Mar 2010 08:12:09 -0800 |

I don't understand why you can't impute outcome variables. ICE will do it. A recent paper by van Hippel notes that a reasonable approach is to impute all the missing values but then delete the cases with missing y-values. His simulations were for normal variables, but I wouldn't be surprised to see they held for categorical ones. Deleting cases without y values is often very dangerous. I'd use ICE and try it both ways. Note that ICE will impute categorical values. Tony Peter A. Lachenbruch Department of Public Health Oregon State University Corvallis, OR 97330 Phone: 541-737-3832 FAX: 541-737-4001 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Rosie Chen Sent: Monday, March 01, 2010 7:03 AM To: statalist@hsphsun2.harvard.edu Subject: st: Missing data on outcome and sample selection bias Carlo, thanks for your response. My question is not related to right censoring or independent variables' missing cases. It is the fact that respondents did not answer the question for the outcome variable. We can't impute outcome values, so that's why we often have to delete cases that have missing values on the dependent variable. But there is a potential sample selection bias. So dear all, here are my several questions regarding a multilevel analysis with missing values on the outcome variable: 1) Do we often compare the deleted cases with the final raw sample without missing data imputation or with the final sample with missing cases imputed? (2) To what extent do t-tests can be useful for determining sample selection bias? What criterion do we use? Do the significant t tests on all predictors indicate such a problem or half of the tests being significant indicates the problem? (3) If t-test is not a very good tool to assess the problem, should we use Heckman method? Can we use Heckman test to detect and remedy the possible sample selection bias problem with a dependent variable in Stata? I learned that there is a Heckman and a GLLMM syntax in Stata, but I am not sure if it can incorporate all three features (multilevel data structure, multiple-imputed data, and complex survey design) into consideration. Your advice would be appreciated very much, Rosie ----- Original Message ---- From: Carlo Lazzaro <carlo.lazzaro@tin.it> To: statalist@hsphsun2.harvard.edu Cc: Rosie Chen <jiarongchen2002@yahoo.com> Sent: Mon, March 1, 2010 1:58:39 AM Subject: R: Missing data analysis Dear Rosie, I am not clear about what you mean with "we have to to delete cases that have missing values", since this is not the standard practice. If you mean (right)censored observations, they can be addressed in Stata via Survival Analysis suite (please, see -stset- and related stuff in Stata 9.2/SE). For more details on dealing with missing observations, especially when they're variables rather than outcomes, you might want to take a look at: Little RJA, Rubin DB. Statistical analysis with missing data. Second Edition. Hoboken, NJ: Wiley, 2002. HTH and Kind Regards, Carlo -----Messaggio originale----- Da: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Rosie Chen Inviato: domenica 28 febbraio 2010 21.31 A: statalist@hsphsun2.harvard.edu Oggetto: st: Missing data analysis Hi, dear listserv members, I have a question that is not specifically related to Stata, but would like to have a try in here: In most studies, we have to delete cases that have missing values on the outcome variable. The issue is whether the deleted cases are significantly different from the final sample we use, because of the potential sample selection bias problem. My question is: do we often compare the deleted cases with the final raw sample without missing data imputation or with the final sample with missing cases imputed? Any suggestions are appreciated very much, Rosie * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: Panel LM Unit Root Test with Heterogenous Structural Breaks***From:*"Burak Darbaz" <infinisafricae@hotmail.com>

**Re: st: RE: Missing data on outcome and sample selection bias***From:*Rosie Chen <jiarongchen2002@yahoo.com>

**References**:**st: R: Missing data analysis***From:*"Carlo Lazzaro" <carlo.lazzaro@tin.it>

**st: Missing data on outcome and sample selection bias***From:*Rosie Chen <jiarongchen2002@yahoo.com>

- Prev by Date:
**re: st: Endogenous Regressors Predicted by the Same IV** - Next by Date:
**Re: st: RE: Missing data on outcome and sample selection bias** - Previous by thread:
**st: Missing data on outcome and sample selection bias** - Next by thread:
**Re: st: RE: Missing data on outcome and sample selection bias** - Index(es):