st: R: missing data for LCA, cross-sectional complex survey data

Date   Wed, 7 Jul 2010 07:43:52 +0200

As far as references about missing data are concerned, Peter may want to
take a look at:

McKnight, P. E., McKnight, K.M., Sidani, S., Figueredo, A.J. Missing Data: A
Gentle Introduction. New York: The Guilford Press, 2007

Little RJA, Rubin DB. Statistical Analysis With Missing Data (2nd ed).
Chichester: Wiley, 2002.

Rubin DB. Multiple Imputation For Nonresponse In Survey. Hoboken: Wiley
Classics Library, 2004.

Groves RM, Dilman DA, Eltinge JL, Little RJA. Survey Nonresponse. New York:
Wiley, 2002.

HTH and Kind Regards,

First time poster here. I am relatively new to Stata and I am still learning
about various approaches to handling missing data. My questions pertain
mostly to statistics and not coding.   I am using Stata version 11.1.
I performed a "baseline" exploratory LCA using Mplus version 5.2. Of course,
Mplus handled the missing data on the latent class indicators using FIML,
and I obtained a 3 class solution. However, there was substantial
missingness on the polytymous covariates I wanted to include in the model
(ranging from 5% to 20%, MAR).  Following the guidelines in Applied Survey
Data Analysis by Heeringa, West & Berglund, I was able to use Royston's MI
ICE command in Stata 11.1 to impute the missing data. I used all of the
analysis variables as well as auxiliary variables as recommended by Heeringa
et al., the UCLA Statistical computing website, and almost every other
source I encountered including the Stata Journal, etc (I have also ordered 2
texts on Amazon devoted entirely to handling missing data and I am eagerly
awaiting their arrival...if anyone can recommend something akin to the
Complete Idiot's Guide to Missing Data, I would be forevever in your debt).
Next, I imported the imputed !
 data sets (M=10) into Mplus.  I then ran the baseline Latent Class model
again using the imputed data, but this time I obtained a 3 class solution
with wildly different proportions for each latent class. 
I have since updated to Mplus version 6, and I am receiving virtually the
same baseline 3 class solution as I obtained using MI ICE in Stata 11.1. If
I grok what I have been reading, MI is generally superior to FIML approaches
to handling missing data. But I still shouldn't be obtaining such vastly
different results for the baseline LC model, no?.  My questions are:
(1) Given the differences in the baseline LC solutions when using FIML vs
MI, is it safe to assume that I must have seriously screwed up when I
specified my missing data models?
(2) Is there an alternate universe where it might be acceptable to only use
MI for the covariates in the LC model while using FIML to handle missingness
on the LC indicators (I suspect the answer is a resounding "no" and that by
even posing such a question I deserve nothing but scorn and derision....As a
non statistician, I figured it couldn't hurt to ask, however.  Please be
Thank you for your consideration. 

P. Cabrera
