Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: RE: Missing data on outcome and sample selection bias |

Date |
Tue, 2 Mar 2010 10:32:29 -0800 |

Retaining the cases with missing y variables allows "better" imputation of the x variables per van Hippel. If you omit them, the prediction model may be biased since there may be some relationship that we don't understand. However, when we are estimating the regression relationship, the inclusion of imputed y's doesn't add anything. Hence van Hippel's recommendation to impute using all data but omit the y variables for the regression. It is only a small improvement, but if you have a lot of missing y variables, you can be in a lot of trouble. Yulia Marchenko (?) at Stata may be able to help you also. Tony Peter A. Lachenbruch Department of Public Health Oregon State University Corvallis, OR 97330 Phone: 541-737-3832 FAX: 541-737-4001 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Rosie Chen Sent: Monday, March 01, 2010 8:32 AM To: statalist@hsphsun2.harvard.edu Subject: Re: st: RE: Missing data on outcome and sample selection bias Thanks, Tony. Let me see if I understand you correctly. Did you mean that, by keeping cases that have missing values on the y variable in the imputation process, we should be able to reduce or remove the possible sample selection bias issue because the imputed x variables' values are based on those cases also? I haven't seen anywhere that this is a standard way to do to address the possible sample selection issue, but please correct me if I am wrong. To keep this discussion thread going, I am posting my questions again. Thanks for every input and advice! -- Rosie Dear all, here are my several questions regarding a multilevel analysis with missing values on the outcome variable: 1) Do we often compare the deleted cases with the final raw sample without missing data imputation or with the final sample with missing cases imputed? (2) To what extent do t-tests can be useful for determining sample selection bias? What criterion do we use? Do the significant t tests on all predictors indicate such a problem or half of the tests being significant indicates the problem? (3) If t-test is not a very good tool to assess the problem, should we use Heckman method? Can we use Heckman test to detect and remedy the possible sample selection bias problem with a dependent variable in Stata? I learned that there is a Heckman and a GLLMM syntax in Stata, but I am not sure if it can incorporate all three features (multilevel data structure, multiple-imputed data, and complex survey design) into consideration. ----- Original Message ---- From: "Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu> To: "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> Sent: Mon, March 1, 2010 11:12:09 AM Subject: st: RE: Missing data on outcome and sample selection bias I don't understand why you can't impute outcome variables. ICE will do it. A recent paper by van Hippel notes that a reasonable approach is to impute all the missing values but then delete the cases with missing y-values. His simulations were for normal variables, but I wouldn't be surprised to see they held for categorical ones. Deleting cases without y values is often very dangerous. I'd use ICE and try it both ways. Note that ICE will impute categorical values. Tony Peter A. Lachenbruch Department of Public Health Oregon State University Corvallis, OR 97330 Phone: 541-737-3832 FAX: 541-737-4001 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Rosie Chen Sent: Monday, March 01, 2010 7:03 AM To: statalist@hsphsun2.harvard.edu Subject: st: Missing data on outcome and sample selection bias Carlo, thanks for your response. My question is not related to right censoring or independent variables' missing cases. It is the fact that respondents did not answer the question for the outcome variable. We can't impute outcome values, so that's why we often have to delete cases that have missing values on the dependent variable. But there is a potential sample selection bias. So dear all, here are my several questions regarding a multilevel analysis with missing values on the outcome variable: 1) Do we often compare the deleted cases with the final raw sample without missing data imputation or with the final sample with missing cases imputed? (2) To what extent do t-tests can be useful for determining sample selection bias? What criterion do we use? Do the significant t tests on all predictors indicate such a problem or half of the tests being significant indicates the problem? (3) If t-test is not a very good tool to assess the problem, should we use Heckman method? Can we use Heckman test to detect and remedy the possible sample selection bias problem with a dependent variable in Stata? I learned that there is a Heckman and a GLLMM syntax in Stata, but I am not sure if it can incorporate all three features (multilevel data structure, multiple-imputed data, and complex survey design) into consideration. Your advice would be appreciated very much, Rosie ----- Original Message ---- From: Carlo Lazzaro <carlo.lazzaro@tin.it> To: statalist@hsphsun2.harvard.edu Cc: Rosie Chen <jiarongchen2002@yahoo.com> Sent: Mon, March 1, 2010 1:58:39 AM Subject: R: Missing data analysis Dear Rosie, I am not clear about what you mean with "we have to to delete cases that have missing values", since this is not the standard practice. If you mean (right)censored observations, they can be addressed in Stata via Survival Analysis suite (please, see -stset- and related stuff in Stata 9.2/SE). For more details on dealing with missing observations, especially when they're variables rather than outcomes, you might want to take a look at: Little RJA, Rubin DB. Statistical analysis with missing data. Second Edition. Hoboken, NJ: Wiley, 2002. HTH and Kind Regards, Carlo -----Messaggio originale----- Da: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Rosie Chen Inviato: domenica 28 febbraio 2010 21.31 A: statalist@hsphsun2.harvard.edu Oggetto: st: Missing data analysis Hi, dear listserv members, I have a question that is not specifically related to Stata, but would like to have a try in here: In most studies, we have to delete cases that have missing values on the outcome variable. The issue is whether the deleted cases are significantly different from the final sample we use, because of the potential sample selection bias problem. My question is: do we often compare the deleted cases with the final raw sample without missing data imputation or with the final sample with missing cases imputed? Any suggestions are appreciated very much, Rosie * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: R: Missing data analysis***From:*"Carlo Lazzaro" <carlo.lazzaro@tin.it>

**st: Missing data on outcome and sample selection bias***From:*Rosie Chen <jiarongchen2002@yahoo.com>

**st: RE: Missing data on outcome and sample selection bias***From:*"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>

**Re: st: RE: Missing data on outcome and sample selection bias***From:*Rosie Chen <jiarongchen2002@yahoo.com>

- Prev by Date:
**Re: st: Inconsistent results with rocfit** - Next by Date:
**Re: st: Competing Risk for repeated event nominal dependent variables** - Previous by thread:
**Re: st: RE: Missing data on outcome and sample selection bias** - Next by thread:
**st: Panel LM Unit Root Test with Heterogenous Structural Breaks** - Index(es):