Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Multiple Imputation in Longitudinal Multilevel Model

From	"JVerkuilen (Gmail)" <[email protected]>
To	[email protected]
Subject	Re: st: Multiple Imputation in Longitudinal Multilevel Model
Date	Wed, 6 Mar 2013 10:00:26 -0500

On Wed, Mar 6, 2013 at 9:32 AM, Stas Kolenikov <[email protected]> wrote:
> 1. Are read1-read3 and math1-math3 three measurements taken at the
> same time for a given individual, or measurements taken over three
> periods? If the former, then your model is "flat", as it does not
> recognize and utilize the longitudinal/multilevel nature of the data.

Yes, you need to put that in, which can be quite challenging. Usually
you need to add in some independent variables to capture the time and
panel trend aspects. If you can afford to add in dummies for each
group (i.e., fixed effects) it's worth it, and for the time structure
a linear, quadratic and cubic term, or some kind of regression spline
structure is also worth considering.

> 2. Once you've done -ice-, don't touch anything (let alone anything as
> drastic as -drop if _mj==0-), and use -mi: estimate- for everything. I
> don't really know how well either -mi- or -ice- go with -reshape-, but
> I suspect that if not done properly, it will screw up the delicate
> mechanics of -mi-.

And given that you can use chained equations in MI, I'd really suggest
doing things with MI directly, not -ice-. Nothing bad about -ice-, but
being able to run entirely in MI is likely to be much easier.

> 3. I agree with Jay that 4 imputations are woefully insufficient. I
> have heard the arguments that you don't see much Monte Carlo
> variability beyond 5 imputations, but I can put two arguments in favor
> of a much greater number, like M=50: first, you don't explore the
> multivariate space of missing data enough (M=5 may be OK for a
> univariate mean, but I can't see how it can work for a 30-dimensional
> space), and second, I want my minimum degrees of freedom to be greater
> than the nominal sample size, so that the limitation on the accuracy
> really comes from the data rather than the computer.

The original argument came from Don Rubin doing some calculations on
univariate means and OLS regression coefficients. It really doesn't
extend past that. Kenward & Carpenter did some work on this suggesting
that you should have many more imputations. This is discussed in the
MI manual, p. 5, with citations. But it depends on what you want to
know, so for a univariate mean it's no big deal and you can get away
with small imputations whereas if you're doing logistic regression on
relatively rare events you need to have many more.

> 4. If you are bringing additional variables to the -xtmixed- model,
> you would probably have been better off using these variables in
> imputation. You had a reason to believe that they affected the
> response, and for that same reason they should be in the imputation
> model.

I'll go one step further: The imputation model needs to be more
comprehensive than the analysis model.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Multiple Imputation in Longitudinal Multilevel Model
  - From: Anthony Fulginiti <[email protected]>
- Re: st: Multiple Imputation in Longitudinal Multilevel Model
  - From: "JVerkuilen (Gmail)" <[email protected]>
- Re: st: Multiple Imputation in Longitudinal Multilevel Model
  - From: Stas Kolenikov <[email protected]>

Prev by Date: Re: st: macro issue
Next by Date: st: ln transform and box cox
Previous by thread: Re: st: Multiple Imputation in Longitudinal Multilevel Model
Next by thread: Re: st: Multiple Imputation in Longitudinal Multilevel Model
Index(es):
- Date
- Thread