[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: Imputing values for categorical data |

Date |
Thu, 15 Apr 2004 21:03:00 +0100 |

impute the missing reference here. In this case, it happens to be a book I know about. (In other cases, in other postings, just giving author surnames and dates makes the reference search difficult: list members please note.) Statistical Analysis With Missing Data, Second Edition Roderick J. A. Little, Donald B. Rubin ISBN: 0-471-18386-5 Wiley September 2002 Nick n.j.cox@durham.ac.uk > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Dupont, > William > Sent: 15 April 2004 20:47 > To: statalist@hsphsun2.harvard.edu > Subject: st: RE: Imputing values for categorical data > > > Jennifer > > In my opinion, imputation makes the most sense when we wish to adjust > for confounding variables. Suppose that I am primarily interested in > the relationship between y and x, and I have complete data on > these two > variables from my data set. I feel, however, that I should adjust my > analysis for a number of other confounding covariates and I know that > missing values are scattered throughout these covariates. If I just > regress y against x and these other covariates I get a complete case > analysis: any record that is missing any value of these covariates is > dropped from the analysis. This can lead to a substantial > loss of power > and has the potential to induce bias if having complete data > is related > to the response of interest. Suppose that one of my confounding > variables is gender. If I have a number of records where y and x are > known but gender is not, it does not seem sensible to throw out this > information just because I would like to adjust my estimates > for gender. > If, however, I impute gender I can avoid loosing these data. > As long as > gender is only in the model as a confounder, I don't see that it does > much harm to have an imputed value of say .2 for some patient, which > means that based on her other covariates that she is 5 times > more likely > to be of one gender than the other. > > A tricky problem with imputation is that we often lack assurance that > the missing values are missing at random. However, even in this > situation, it is unclear that the complete case analysis is > superior to > an imputed analysis for the situation described above. Imputation > becomes much more problematic when some variables of primary interest > have missing values. > > The imputation gurus do not like the single conditional imputation > provided by Stata (see for example Little and Rubin 2002). This is > because this technique underestimates the standard error of the > regression coefficient for covariates with imputed values and > overestimates the degrees of freedom. Multiple imputation methods get > around this problem and are fine as long as you are confident that the > missing values are missing at random. If your are only using > imputation > for confounding variables I'm not convinced that it makes much > difference how you do the imputation. However, multiple imputation is > always theoretically preferable and can avoid hassles in the > event that > you come up against a referee who objects to all use of single > conditional imputation. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: RE: Imputing values for categorical data** - Next by Date:
**st: Questions which probably won't get much of an answer** - Previous by thread:
**st: RE: RE: Imputing values for categorical data** - Next by thread:
**st: Questions which probably won't get much of an answer** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |