Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Multiple Imputation (MI)

From   Richard Williams <>
To, Statalist <>
Subject   Re: st: Multiple Imputation (MI)
Date   Tue, 14 May 2013 00:40:32 -0500

This is a very good source:


Here and elsewhere you'll see the case made for what Royston calls the "just another variable" approach. Compute interaction and squared terms first and then impute them like you would any other variable.

I'm not quite sure about centering. My inclination is to still use the "just another variable" approach. But one of the things I dislike about centering around the mean is that it does always change from sample to sample. Rather than centering about the mean, you might center about some other meaningful value. For example, if you were examining years of education in the United States, you might subtract 12 so that 0 stood for high school graduate.

Also, several good references are listed at


White, Royston and Wood 2011 seems very good to me.

Finally, you can see this earlier exchange on Statalist where Paul Allison offered his thoughts:

At 09:06 PM 5/13/2013, Saul G. Alamilla wrote:
Dear Statalist Members,

I have some questions pertaining to multiple imputation. I have a
dataset of about 10,000 individuals and need to impute some variables with
considerable missingess (MAR). I am using the ice and mi commands in Stata 11.2

I plan to include substantive interactions terms (mean centered) in the
imputation model.

My questions/concerns are as follows:

1) Does mean centering need to be performed before imputing
data?  If so, because after imputation
the "centered" means will almost surely not be 0, would it be
advisable to center yet again at that point?

2) Is there a satisfactory way to impute interaction terms? Are
there any specific references regarding imputation and interaction terms (other
than articles such as Graham, 2009, which deals with interactions in passing)?
One approach would be to impute each of the input variables
individually and then take their product, but the imputing of the input
variables is especially delicate inasmuch as nonlinearities are introduced.

3. On a side note, are they any satisfactory ways to perform MI in
Stata with clustered data. I am aware of programs such as PAN (Schafer 2001,
Schafer & Yucel 2002), but am looking for MI commands or programs in Stata
geared for clustered/nested data, OR acceptable and manageable strategies for
imputing with such data.

Thanks in advance,

*   For searches and help try:

Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  Richard.A.Williams.5@ND.Edu

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index