Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Multiple Imputation in Longitudinal Multilevel Model

From   "JVerkuilen (Gmail)" <>
Subject   Re: st: Multiple Imputation in Longitudinal Multilevel Model
Date   Wed, 6 Mar 2013 10:00:26 -0500

On Wed, Mar 6, 2013 at 9:32 AM, Stas Kolenikov <> wrote:
> 1. Are read1-read3 and math1-math3 three measurements taken at the
> same time for a given individual, or measurements taken over three
> periods? If the former, then your model is "flat", as it does not
> recognize and utilize the longitudinal/multilevel nature of the data.

Yes, you need to put that in, which can be quite challenging. Usually
you need to add in some independent variables to capture the time and
panel trend aspects. If you can afford to add in dummies for each
group (i.e., fixed effects) it's worth it, and for the time structure
a linear, quadratic and cubic term, or some kind of regression spline
structure is also worth considering.

> 2. Once you've done -ice-, don't touch anything (let alone anything as
> drastic as -drop if _mj==0-), and use -mi: estimate- for everything. I
> don't really know how well either -mi- or -ice- go with -reshape-, but
> I suspect that if not done properly, it will screw up the delicate
> mechanics of -mi-.

And given that you can use chained equations in MI, I'd really suggest
doing things with MI directly, not -ice-. Nothing bad about -ice-, but
being able to run entirely in MI is likely to be much easier.

> 3. I agree with Jay that 4 imputations are woefully insufficient. I
> have heard the arguments that you don't see much Monte Carlo
> variability beyond 5 imputations, but I can put two arguments in favor
> of a much greater number, like M=50: first, you don't explore the
> multivariate space of missing data enough (M=5 may be OK for a
> univariate mean, but I can't see how it can work for a 30-dimensional
> space), and second, I want my minimum degrees of freedom to be greater
> than the nominal sample size, so that the limitation on the accuracy
> really comes from the data rather than the computer.

The original argument came from Don Rubin doing some calculations on
univariate means and OLS regression coefficients. It really doesn't
extend past that. Kenward & Carpenter did some work on this suggesting
that you should have many more imputations. This is discussed in the
MI manual, p. 5, with citations. But it depends on what you want to
know, so for a univariate mean it's no big deal and you can get away
with small imputations whereas if you're doing logistic regression on
relatively rare events you need to have many more.

> 4. If you are bringing additional variables to the -xtmixed- model,
> you would probably have been better off using these variables in
> imputation. You had a reason to believe that they affected the
> response, and for that same reason they should be in the imputation
> model.

I'll go one step further: The imputation model needs to be more
comprehensive than the analysis model.
*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index