[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Dupont, William" <william.dupont@vanderbilt.edu> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Imputing values for categorical data |

Date |
Thu, 15 Apr 2004 14:47:15 -0500 |

Jennifer In my opinion, imputation makes the most sense when we wish to adjust for confounding variables. Suppose that I am primarily interested in the relationship between y and x, and I have complete data on these two variables from my data set. I feel, however, that I should adjust my analysis for a number of other confounding covariates and I know that missing values are scattered throughout these covariates. If I just regress y against x and these other covariates I get a complete case analysis: any record that is missing any value of these covariates is dropped from the analysis. This can lead to a substantial loss of power and has the potential to induce bias if having complete data is related to the response of interest. Suppose that one of my confounding variables is gender. If I have a number of records where y and x are known but gender is not, it does not seem sensible to throw out this information just because I would like to adjust my estimates for gender. If, however, I impute gender I can avoid loosing these data. As long as gender is only in the model as a confounder, I don't see that it does much harm to have an imputed value of say .2 for some patient, which means that based on her other covariates that she is 5 times more likely to be of one gender than the other. A tricky problem with imputation is that we often lack assurance that the missing values are missing at random. However, even in this situation, it is unclear that the complete case analysis is superior to an imputed analysis for the situation described above. Imputation becomes much more problematic when some variables of primary interest have missing values. The imputation gurus do not like the single conditional imputation provided by Stata (see for example Little and Rubin 2002). This is because this technique underestimates the standard error of the regression coefficient for covariates with imputed values and overestimates the degrees of freedom. Multiple imputation methods get around this problem and are fine as long as you are confident that the missing values are missing at random. If your are only using imputation for confounding variables I'm not convinced that it makes much difference how you do the imputation. However, multiple imputation is always theoretically preferable and can avoid hassles in the event that you come up against a referee who objects to all use of single conditional imputation. Bill Dupont -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Jennifer Wolfe Borum Sent: Thursday, April 08, 2004 5:50 PM To: statalist@hsphsun2.harvard.edu Subject: st: Imputing values for categorical data Hello, I am working with a data set composed of responses to survey questions which contains some categorical variables such as gender and ethnicity. The data has missing values and I have decided that it would be best to keep all observations due to a pattern in the missing values. I have decided to use the impute command in Stata to handle this as I've had some difficulty and am not familiar enough with the hotdeck and Amelia imputations. I've found that impute works fine for the continuous variables, however for the categorical variables I am obtaining values for which I am unsure how to interpret. For example, I will get an imputed value of .35621 for gender which is coded 1 or 0. Would anyone be able to help with the interpretation of the values I am obtaining for the categorical data? Also, I would be interested in knowing which approach other Stata users prefer for imputing values as this is the first time I have encountered missing values and I am just beginning to research the various methods of imputation. Thanks in advance, Jennifer Graduate Student Florida International University * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: Panel Data Analysis** - Next by Date:
**st: Initial values maximum likelihood** - Previous by thread:
**st: RE: Panel Data Analysis** - Next by thread:
**st: Initial values maximum likelihood** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |