Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Best method for imputing dichotomous variables


From   Maarten buis <[email protected]>
To   [email protected]
Subject   RE: st: Best method for imputing dichotomous variables
Date   Thu, 14 Feb 2008 13:29:09 +0000 (GMT)

--- Ren� Wevers <[email protected]> wrote:
> The basis of the statement is twofold, indeed one reason is coming
> from the hard to explain results I got from -ice- for the
(continuous)
> variables I mentioned yesterday. 

They are quite easily explainable (though you mentioned that you still
needed to check that): The distribtion of sizes of companies is never
going to be anywhere near a Gaussian (normal) distribution, however
strange your sampling scheme may be. 

> However, another reason comes from a simple test I performed with 
> -ice-. I randomly created missing values (25%) for a dichotomous
> variable where there were none missing and imputed these 'missing'
> values with -ice-. Afterwards approx. 700 out of 3000 imputed
> values proved to be different from the original values. When I used
> -impute- and rounded the results only 350 out of 3000 imputed values
> were different from the original values. Naturally this is a very
> weak test, but 700 out of 3000 'faulty' imputed values does not give
> me a lot of confidence in -ice- for my case.

I like simulations as a means of gaining understanding of statistical
techniques, and you and me are in good company: There is a working
paper by Stef van Buuren, Jaap Brand, karin Groothuis-oudshoorn, and
Don Rubin (who invented multiple imputation) that does a simulation
study of MICE (R and Splus), -ice- (Stata), and IVEWARE (SAS). (full
reference below)

In the past I have posted a number of simulations of -ice- on the
statalist:
http://www.stata.com/statalist/archive/2007-04/msg00900.html
http://www.stata.com/statalist/archive/2007-05/msg00778.html
http://www.stata.com/statalist/archive/2007-12/msg00504.html

Neither Van Buuren et al. nor I could find something systematically
wrong with -ice-. The reason for the difference in our finding and your
finding is that you used the wrong criterium for success: multiple
imputation never claims to be able to recover not observed values, it
claims to be able (under the MAR assumption) to recover means,
proportions, variances, of variables and patterns of association
between variables. Counterintuitive as it may sound the "better"
performance of -impute- is actualy the result of the fact that is is
worse than -ice- (it ignores the uncertainty around the prediction). 

Hope this helps,
Maarten

VAN BUUREN S, BRAND JPL, GROOTHUIS-OUDSHOORN CGM, RUBIN DB. Fully
Conditional Specification in Multivariate Imputation. Journal of
Statistical Computation and Simulation, in press. Simulation study on
the MICE algorithm.
http://web.inter.nl.net/users/S.van.Buuren/mi/hmtl/mice.htm


-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


      __________________________________________________________
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index