[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: MI mvn with categorical data
Cameron McIntosh <firstname.lastname@example.org>
STATA LIST <email@example.com>
RE: st: MI mvn with categorical data
Fri, 30 Sep 2011 10:15:19 -0400
Or, possibly some other methods designed for imputation in the discrete case:
Royston, P. (2009). Multiple imputation of missing values: Further update of ice, with an emphasis on categorical variables. The Stata Journal 9(3), 466–477.http://www.stata-journal.com/article.html?article=st0067_4http://ideas.repec.org/c/boc/bocode/s446602.html
van Buuren, S. (2010). Item Imputation Without Specifying Scale Structure. Methodology, 6(1), 31–36.http://www.stefvanbuuren.nl/publications/Item%20imputation%20-%20Methodology%202010.pdf
van Buuren, S. & Groothuis-Oudshoorn, K. (March 26, 2011). Multivariate Imputation by Chained Equations: Package ‘mice’, Version 2.8.http://cran.r-project.org/web/packages/mice/mice.pdfhttp://cran.r-project.org/web/packages/mice/index.html
Van Buuren, S., & Groothuis-Oudshoorn, K. (2011). MICE: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, forthcoming. http://www.stefvanbuuren.nl/publications/MICEinR-Draft.pdf
Van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn C.G.M., & Rubin, D.B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76(12), 1049–1064. http://www.stefvanbuuren.nl/publications/FCSinmultivariateimputation-JSCS2006.pdf
Van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16, 3, 219–242. http://www.stefvanbuuren.nl/publications/MIbyFCS-SMMR2007.pdf
Lee, K.J., & Carlin, J.B. (2010). Multiple Imputation for Missing Data: Fully Conditional Specification Versus Multivariate Normal Imputation. American Journal of Epidemiology, 171(5), 624-632.
> Date: Fri, 30 Sep 2011 14:03:32 +0100
> Subject: Re: st: MI mvn with categorical data
> From: firstname.lastname@example.org
> To: email@example.com
> No idea on these specifics, but it seems odd not to be exploiting
> -mi-'s support for -logit- and -ologit-.
> Note that minimal date (year) references are (repeatedly) deprecated
> on this list.
> On Fri, Sep 30, 2011 at 1:49 PM, Andrea Bennett <firstname.lastname@example.org> wrote:
> > Dear Statalisters
> > I am working with Stata's <mi> command and try to impute missing data with <mi impute mvn>. A bunch of variables are categorical (some nominal, some ordered). There is a procedure described in Allison(2002) or also Enders (2010) where it is suggested that one creates dummies for all categories of a variable, except a baseline category, imputes the values, and then calculates the missing baseline value by: baseline=1 - category1 - category2 - category4 (for the case of a variable with four categories). The category dummy with the highest probability value is then coded as 1, the rest as zero.
> > I can do that just fine with the <mi impute mvn>. I have complete and valid data after imputation (mi describe, detail). However, when I try to put together the single category dummies into the original variable, <mi update> thinks that some imputations are not valid anymore and drops imputation records for some of the original observations containing missing values. This is also shown in <mi describe, detail> where it e.g. says: categorical_variable(6; 20*2), meaning that the variable has 6 missing values, and out of 20 imputations, 2 are missing from all imputation sets.
> > I see no mistakes on my side with regard to putting the dummies back into a single variable. That all looks exactly as it should. Yet, as soon as the data is updated, some observations of the imputed data gets deleted, leaving me only with the original missing observation for that record.
> > I run the imputation in the <mlong> format. From the Stata handbook I understand that under <mlong> I can perform any data manipulation without <mi passive> as long as I do manually register the variables afterwards. I also tried to do all manipulation with <mi passive> or <mi xeq> but Stata tells me that I need something better than Stata IC in order to handle more variables. I guess these variables are auxiliary variables Stata need to perform the imputation since I am nowhere near the maximum of Stata IC. Alternatively, it might be related to the fact that some queries specifically check that _mi_miss==., and hence <mi passive> might run into troubles simply because of that.
> > Does any of you know what exactly is causing the problem, and how to solve it? I know I could try <mi impute chained>. From a theoretical point of view, I prefer <mvn>, and also already put quite a bit of effort into the <mvn> analysis. Moreover, I simply would like to know why it does not work.
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
* For searches and help try: