Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: MI mvn with categorical data
Stas Kolenikov <email@example.com>
Re: st: MI mvn with categorical data
Fri, 30 Sep 2011 09:08:21 -0500
If you have Stata 12+, you'd want to explore -mi impute chained-. If
you have Stata 11.99-, you would want to take a look at -ice- by
Patrick Royston. There is little sense to use the tool that you
yourself know to be inappropriate. If Enders ignores Stata in his
book, it means you have to explore it on your own and find the best
performing methods that may not be even mentioned in that book.
On Fri, Sep 30, 2011 at 7:49 AM, Andrea Bennett <firstname.lastname@example.org> wrote:
> Dear Statalisters
> I am working with Stata's <mi> command and try to impute missing data with <mi impute mvn>. A bunch of variables are categorical (some nominal, some ordered). There is a procedure described in Allison(2002) or also Enders (2010) where it is suggested that one creates dummies for all categories of a variable, except a baseline category, imputes the values, and then calculates the missing baseline value by: baseline=1 - category1 - category2 - category4 (for the case of a variable with four categories). The category dummy with the highest probability value is then coded as 1, the rest as zero.
> I can do that just fine with the <mi impute mvn>. I have complete and valid data after imputation (mi describe, detail). However, when I try to put together the single category dummies into the original variable, <mi update> thinks that some imputations are not valid anymore and drops imputation records for some of the original observations containing missing values. This is also shown in <mi describe, detail> where it e.g. says: categorical_variable(6; 20*2), meaning that the variable has 6 missing values, and out of 20 imputations, 2 are missing from all imputation sets.
> I see no mistakes on my side with regard to putting the dummies back into a single variable. That all looks exactly as it should. Yet, as soon as the data is updated, some observations of the imputed data gets deleted, leaving me only with the original missing observation for that record.
> I run the imputation in the <mlong> format. From the Stata handbook I understand that under <mlong> I can perform any data manipulation without <mi passive> as long as I do manually register the variables afterwards. I also tried to do all manipulation with <mi passive> or <mi xeq> but Stata tells me that I need something better than Stata IC in order to handle more variables. I guess these variables are auxiliary variables Stata need to perform the imputation since I am nowhere near the maximum of Stata IC. Alternatively, it might be related to the fact that some queries specifically check that _mi_miss==., and hence <mi passive> might run into troubles simply because of that.
> Does any of you know what exactly is causing the problem, and how to solve it? I know I could try <mi impute chained>. From a theoretical point of view, I prefer <mvn>, and also already put quite a bit of effort into the <mvn> analysis. Moreover, I simply would like to know why it does not work.
> Any advice would be greatly welcomed,
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
* For searches and help try: