[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Stata 11 imputation
Fred Wolfe <email@example.com>
Re: st: Stata 11 imputation
Tue, 28 Jul 2009 06:21:11 -0500
Thank you very much for your thorough reply. I do hope that Stata
continues to actively support ICE. Stata will be a richer package if
both current major methods are available and supported.
On Mon, Jul 27, 2009 at 4:32 PM, Yulia Marchenko, StataCorp
> Fred Wolfe <firstname.lastname@example.org> asks about imputing
> multiple categorical variables using -mi impute mvn- available
> as of Stata 11:
>> I wonder if it might be possible in a revision of the manual to
>> actually describe how to impute categorical values without having to
>> purchase Allison's book (available on Amazon.com at a reasonable
>> cost). There are a lot of "simple" examples in the manual. but no
>> complex examples - somethings that would be helpful.
> Before I answer Fred's specific questions, let me note that imputing multiple
> categorical variables is a difficult task in general. Currently, there is no
> definitive recommendation in the literature to what imputation method should
> be used to perform this task.
> Multivariate normal imputation is not designed for imputing multiple
> categorical variables. However, Allison (2000, 40) suggests an ad hoc way of
> how this can be done. One can use a dummy representation of categorical
> variables to impute the corresponding indicator variables. For example, if a
> variable contains three categories, one will impute two indicator variables,
> corresponding to two categories, and then will compute the third indicator
> variable, corresponding to the reference category, as one minus the sum of the
> two imputed indicator variables. The imputed indicator variables will contain
> values on a continuous scale. To convert them to the binary metric, you
> assign 1 to an indicator variable with the largest value and 0 to the other
> indicator variables. More simulation is needed to evaluate the performance of
> this method in practice.
> Allison (2000) also notes that the analysis using imputed values without
> rounding is superior to that which uses rounded imputed values (as described
> above). Our simulations displayed similar behavior in the case of binary
> However, if a binary or categorical _dependent_ variable is being imputed
> using a regression-based method, rounding is unavoidable.
>> Would it be possible for StataCorp people to indicate on the list the
>> advantages of their multivariate method compared with Royston's.
> -mi impute mvn- implements a method for imputing multivariate continuous data
> based on Schafer (1997), which is an extension of the theoretical work by Li
> (1988). This method is commonly referred to as NORM. NORM assumes a joint
> multivariate normal distribution and uses data augmentation (an iterative MCMC
> procedure) to simulate a predictive distribution from which imputed values are
> Patrick Royston's -ice- command implements imputation via chained equations
> (ICE). ICE uses Gibbs sampling, another MCMC procedure, to obtain imputed
> values. ICE, however, does not assume a joint multivariate model. Instead,
> it uses a set of univariate full conditional specifications. In general,
> these do not always lead to a proper multivariate distribution.
> The main advantage of NORM is a theoretical one -- the convergence of the
> method to a proper posterior distribution is theoretically justified.
> Theoretical justification for the chained equation approach in general is not
> as well developed in literature, mainly because the chained-equation approach
> is not always supported by a proper underlying multivariate model; see, for
> example, van Buuren (2007).
> The main advantage of ICE is that it is more flexible than NORM and can more
> directly handle non-continuous data. However, as mentioned above convergence
> to a proper multivariate distribution can be an issue.
> Under the assumption of normality, ICE corresponds to a pure Gibbs sampling
> procedure and is equivalent to NORM. The two procedures performed comparably
> in our simulation. More simulation is needed, however, to compare the two
> methods for imputing binary or categorical data.
> Allison, P. D. 2001. Missing Data. Thousand Oaks, CA: Sage.
> Li, K.-H. 1988. Imputation using Markov chains. Journal of Statistical
> Computation and Simulation 30: 57--79.
> Schafer, J. L. 1997. Analysis of Incomplete Multivariate Data. Boca Raton,
> FL: Chapman & Hall/CRC.
> van Buuren, S. 2007. Multiple imputation of discrete and continuous data by
> fully conditional specification. Statistical Methods in Medical Research 16:
> -- Yulia
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
National Data Bank for Rheumatic Diseases
NDB Office +1 316 263 2125 Ext 0
Research Office +1 316 686 9195
* For searches and help try: