Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Stata 11 imputation


From   ymarchenko@stata.com (Yulia Marchenko, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   RE: st: Stata 11 imputation
Date   Tue, 28 Jul 2009 15:46:18 -0500

Peter Lachenbruch <Peter.Lachenbruch@oregonstate.edu> asks:

> In my problem, I have some continuous (maybe 'normal') variables, some 
> dichotomous variables, and some categorical variables.  It looks like mi
> impute will allow me to impute the normal variables and all others, but when
> I want to impute the categorical variables it looks as if I will re-impute
> the normal ones as categories.  I will likely need to continue to use ICE.

In a general case, when the pattern of missingness is arbitrary and when all
variables are of different types and must modeled simultaneously, -ice- is the
most flexible method.  With this in mind, we developed -mi import ice- and -mi
export ice- allowing users to switch between -mi-'s and -ice-'s data formats
easily.  This way you can still use -ice- to obtain imputations and then
import imputed data to -mi- to utilize Stata's new data management and
estimation capabilities.

As a side note, there are alternative methods to the imputation via chained
equations (ICE) for multivariate categorical and mixed data which are based on
the underlying joint models, log-linear and general-location models (Schafer
1997).  These methods, however, are very restrictive with respect to the
dimensionality of the model and are also often difficult to converge.
Therefore, ICE remains the most practical choice albeit less theoretically
justified.

Hypothetically, one can use -mi impute mvn- in the above most general case and
then round imputed binary and categorical variables (if needed) afterwords.
Depending on the number of binary and categorical variables, the underlying
assumption of joint normality, however, may be a suspect.  In any case,
extensive simulations are needed to investigate the robustness of this method
with mixed types of variables.

Let me briefly describe the cases for which one could still use -mi impute- in
the presence of different types of variables.

1. When the pattern of missingness is monotone (which I admit is rare in
   practice), one can use -mi impute monotone- to impute variables of
   different types simultaneously.

2. If there are only a few observations destroying a monotone-missing pattern,
   one can consider discarding those observations and then proceed with using
   -mi impute monotone-.

3. If it is reasonable to assume independence among blocks of variables, these
   blocks of variables can be imputed separately using combinations of -mi
   impute monotone-, -mi impute mvn-, and any of the available univariate
   imputation methods (e.g. -mi impute regress-, -mi impute logit-, etc.).

In the case of (3), you might type 

        . mi impute mvn x1 x2 = x3, add(20)

        . mi impute monotone (mlogit) x4 x5 = x3, replace

Note that the second command did *NOT* replace the imputed values in x1 and
x2.  In the above, we assume (x1, x2) and (x4, x5) are independent
conditionally on complete covariate x3.  We also assume that x1 and x2 are
continuous with arbitrarily missing values, x4 and x5 are categorical and
follow a monotone-missing pattern.


References:

Schafer, J. L. 1997. Analysis of Incomplete Multivariate Data. Boca Raton, 
FL: Chapman & Hall/CRC.


-- Yulia
ymarchenko@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index