Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: MI with huge repeated cross-sectional survey: split & merge?


From   Michelle Carras <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: MI with huge repeated cross-sectional survey: split & merge?
Date   Wed, 26 Mar 2014 14:15:10 +0000

Hi everyone,

I'm wondering what might be the best way to multiply impute some variables from a repeated cross-sectional survey dataset where some individuals do not have more than one year of observations. I'd like to use chained equations, but I have so many variables to impute over four years that when I try to impute them in the database as is, I can't get a model to converge.  

My ultimate goal is to multiply impute covariates to create a single database that has no missing covariates to use in latent profile modeling in Mplus, which drops any observations where covariates are missing. 

I'm starting with a wide format database with an indicator for year (y) of participation. I would like to impute, e.g.,  days of substance use (d) that are missing for that year and units of substance use (u) that are missing for years where the student participated and answered something other than "never" to days.  In the table below, .a indicates that the student did not participate for that year. 
									
ID	y9	y10	y11	d9	d10	d11	un9	u10	u11
1	1	0	0	.	.a	.a	.	.a	.a
2	0	1	1	.a	3	4	.a	15	.
3	0	0	1	.a	.a	1	.a	.a	.
								
									
I thought I might split the database into years and impute separately by year so I can get convergence, but I realize I'm then ignoring a lot of the potentially valuable info if students responded in other years. I tried that anyway and it seems to work, but I'm not sure whether my technique of splitting, imputing, then merging is correct-the extra observations from trying to match datasets with different numbers of imputations produces a lot of extra observations.  How can I check to make sure my result is a stacked database from each year that contains imputed observations for the covariates, so that when I model in Mplus I will be using the same database for each model (i.e., not dropping observations here and there)?  Here's what I used to test the imputing/stacking technique:

save original
keep if y9==1
*this includes the kids who were in more than one year if they were in year9 mi set wide
mi register imputed ///	
	d9 u9 [other vars]
mi register regular ///
	[numerous other vars from year 9]
mi impute chained (pmm) d9 u9, ///
	add(2) noisily rseed(222)
save nine, replace
use original, clear
keep if y10==1
*this includes the kids who were in more than one year if they were in year10 mi set wide
mi register imputed ///	
	d10 u10 [other vars]
mi register regular ///
	[numerous other vars from year 10]
mi impute chained (pmm) d10 u10, ///
	add(2) noisily rseed(222)
save ten, replace
use nine, clear
mi merge 1:1 ID using ten, gen(merge)

Thanks for any help or suggestions! 

~Michelle



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index