Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: MI mvn with categorical data

From   Andrea Bennett <[email protected]>
To   [email protected]
Subject   st: MI mvn with categorical data
Date   Fri, 30 Sep 2011 14:49:05 +0200

Dear Statalisters

I am working with Stata's <mi> command and try to impute missing data with <mi impute mvn>. A bunch of variables are categorical (some nominal, some ordered). There is a procedure described in Allison(2002) or also Enders (2010) where it is suggested that one creates dummies for all categories of a variable, except a baseline category, imputes the values, and then calculates the missing baseline value by: baseline=1 - category1 - category2 - category4 (for the case of a variable with four categories). The category dummy with the highest probability value is then coded as 1, the rest as zero.

I can do that just fine with the <mi impute mvn>. I have complete and valid data after imputation (mi describe, detail). However, when I try to put together the single category dummies into the original variable, <mi update> thinks that some imputations are not valid anymore and drops imputation records for some of the original observations containing missing values. This is also shown in <mi describe, detail> where it e.g. says: categorical_variable(6; 20*2), meaning that the variable has 6 missing values, and out of 20 imputations, 2 are missing from all imputation sets.

I see no mistakes on my side with regard to putting the dummies back into a single variable. That all looks exactly as it should. Yet, as soon as the data is updated, some observations of the imputed data gets deleted, leaving me only with the original missing observation for that record.

I run the imputation in the <mlong> format. From the Stata handbook I understand that under <mlong> I can perform any data manipulation without <mi passive> as long as I do manually register the variables afterwards. I also tried to do all manipulation with <mi passive> or <mi xeq> but Stata tells me that I need something better than Stata IC in order to handle more variables. I guess these variables are auxiliary variables Stata need to perform the imputation since I am nowhere near the maximum of Stata IC. Alternatively, it might be related to the fact that some queries specifically check that _mi_miss==., and hence <mi passive> might run into troubles simply because of that.

Does any of you know what exactly is causing the problem, and how to solve it? I know I could try <mi impute chained>. From a theoretical point of view, I prefer <mvn>, and also already put quite a bit of effort into the <mvn> analysis. Moreover, I simply would like to know why it does not work.

Any advice would be greatly welcomed,


*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index