[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Renzo Comolli" <renzo.comolli@yale.edu> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: Imputing values for categorical data |

Date |
Thu, 8 Apr 2004 23:29:44 -0400 |

Hi Jennifer, I have one piece of advice: be very careful when using -impute- It is not suitable to impute categorical variables, and I am surprise the manual does not mention that. When I actually "ripped the ado file open" an saw what it does I gave up on imputing categorical variables, but I had never done imputations before so I have very little knowledge of the field At its core, -impute- does a simple OLS projection. Let me explain with a simplified case first and then with a more complicated case. Simplifying assumption: only one variable (denoted by y) necessitates to be imputed, all the other variables (denoted by matrix X) have no missings. Without loss of generality assume that you have ordered the variable y so that all the cases for which you have observations appear at the top (denote this part of the vector y'), and all the missings at the bottom, denote this part of the vector y by y". Also denote by X' and X" the corresponding values of X (remember that X has no missings, X" just contains the X values corresponding to the observation y") Then -impute- trivially does OLS of y'=X'beta+epsilon where beta is the OLS vector of coefficients. It saves it and imputes y" by doing X"beta So of course this is completely unsuitable for cases categorical variables. Even with continuous variables you have to be careful not to predict "out of range". Let's assume that you are predicting "number of weeks of work", it might well happen that -impute- predicts that the interviewee worked -1 weeks last year The case is not that simple when the matrix X contains missing variables itself. If so, -impute- looks for the best subset of regressors. In practice -impute- repeats the procedure explained here above several times trying to keep as many regressors as possible (exactly how I did not understand either from the ado file or from the manual, but I did not spend much time on it, because I did not care that much. Said that, I did not know of these other methods you mentioned (hotdeck, Amelia) and I would be glad to read what others have to say about it. Best, Renzo Comolli ---------------------------------------------------------------------------- ---- *From Jennifer Wolfe Borum <jjfrog@bellsouth.net> To <statalist@hsphsun2.harvard.edu> Subject st: Imputing values for categorical data Date Thu, 8 Apr 2004 18:50:21 -0400 ---------------------------------------------------------------------------- ---- Hello, I am working with a data set composed of responses to survey questions which contains some categorical variables such as gender and ethnicity. The data has missing values and I have decided that it would be best to keep all observations due to a pattern in the missing values. I have decided to use the impute command in Stata to handle this as I've had some difficulty and am not familiar enough with the hotdeck and Amelia imputations. I've found that impute works fine for the continuous variables, however for the categorical variables I am obtaining values for which I am unsure how to interpret. For example, I will get an imputed value of .35621 for gender which is coded 1 or 0. Would anyone be able to help with the interpretation of the values I am obtaining for the categorical data? Also, I would be interested in knowing which approach other Stata users prefer for imputing values as this is the first time I have encountered missing values and I am just beginning to research the various methods of imputation. Thanks in advance, Jennifer Graduate Student Florida International University * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**RE: st: ORs for non-rare outcomes** - Next by Date:
**RE: st: RE: -for- versus -for each-** - Previous by thread:
**Re: st: Imputing values for categorical data** - Next by thread:
**RE: st: Imputing values for categorical data** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |