[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Leonelo Bautista <lebautista@wisc.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: RE: RE: RE: RE: Imputing values for categorical data |

Date |
Fri, 16 Apr 2004 09:48:24 -0500 |

I'd be very hesitant to use indicator variables to model missing variables. The group of subjects with the missing values will be a mixture of subjects from the other categories of the variable. Therefore, the relative risk (or the odds ratio) in this group would be biased (even if missing values occur at random). When combining the stratum specific relative risk to obtain an adjusted relative risk, we will be summarizing biased and unbiased estimated and the adjusted relative risk would be biased. See "Vach W, Blettner M. Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods of correcting for missing values for confounding variables. Am J Epidemiol 1991;134:895-907" for a good discussion of these issues. Leonelo Bautista -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Dupont, William Sent: Friday, April 16, 2004 7:54 AM To: bill magee; statalist@hsphsun2.harvard.edu Subject: st: RE: RE: RE: Imputing values for categorical data Bill What you propose sounds reasonable to me. However, I recently submitted a paper that did this and was trashed by a referee, in part because of how I was handling missing values. Using an indicator variable for missing values has the advantage as it gives you some idea as to whether you are, in fact, dealing with nonignorable missing data. However, this approach does not appear to be in fashion at this time. My own approach to data analysis is to attempt to use methods that 1. I think are reasonable, 2. are widely accepted within the biostatistical community, and 3. avoid ignoring or attacking sacred cows that are dear to likely referees. My own sense is that, at least in medical statistics, multiple imputation is becoming a very popular way of dealing with missing data. I also feel it is a sensible approach, particularly if it is only used for confounding variables or if your study design gives you reason to believe that the missing data is missing at random. I would be interested to know what other Statalisters think about using indicator variables to model missing values. Bill -----Original Message----- From: bill magee [mailto:magee@chass.utoronto.ca] Sent: Friday, April 16, 2004 5:11 AM To: Dupont, William Subject: RE: RE: RE: Imputing values for categorical data Hi Bill -- Rather than imputing a missing categorical control or confounding variable, such as gender, wouldn't it usually be better to just include a category for missing (e.g. a dummy for female, a dummy for missing, with male as the excluded contrast group)? bill magee * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: RE: RE: RE: Imputing values for categorical data***From:*"Dupont, William" <william.dupont@vanderbilt.edu>

- Prev by Date:
**Re: st: RE: RE: RE: Imputing values for categorical data** - Next by Date:
**st: RE: Initial values maximum likelihood** - Previous by thread:
**Re: st: RE: RE: RE: Imputing values for categorical data** - Next by thread:
**st: invoking -test- for nested nbreg models with robust standard errors** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |