Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: mi impute, conditional()


From   Stas Kolenikov <skolenik@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: mi impute, conditional()
Date   Thu, 24 Oct 2013 14:40:08 -0500

I have trouble understanding how to specify the -conditional()-
option, and why it has the limitations that it does. I want to impute
missing values for all observations in the data set, but only use a
part of the data set to calibrate the imputation model on. I thought I
would achieve this with -conditional()- options, but ran into
limitations that I don't understand.

Suppose I want to impute the repair record using the model only on the
domestic cars, as prices for the foreign cars work in a way that
different from what I am interested in (however, I am happy to keep
the existing values, if there are any, for foreign cars, and impute
the missing values, but all using only the model based on the domestic
cars). So I am trying

sysuse auto, clear
set seed 10203
mi set wide
mi register imputed rep78
mi register regular price weight length foreign
mi impute chained (pmm, cond( foreign==0 ) ) rep78 = price weight
length foreign, add(5)

I get an error:

conditional(): imputation variable not constant outside conditional sample;
    rep78 is not constant outside the subset identified by
(foreign==0) within the
    imputation sample
 -- above applies to specification (pmm , cond( foreign==0 )) rep78 =
price weight length
    foreign

Why does -mi impute- bother checking what's going on with the
complement of the conditional sample? I understand the medical
examples that the manual gives (I agree that it does not make sense to
ask males about the number of pregnancies, or non-smokers about the
number of cigarettes), but that's not a check that would be relevant
for every situation. Insisting on a single non-missing values outside
of the conditional sample (a version of the monotone missing data
pattern, I guess) is extremely constricting on behalf of -mi
impute-... which is supposed to be very general, right? In my example,
I don't see any reason why my data should have this restrictive set
up. With -webuse nlswork-, I may have been using the salary equation
from the 1970s to calibrate the imputation model, and apply the model
to the data from 1980s, which may have some real and some missing data
that would not fit the expectations of the -conditional()- approach
here.

Is there an override to turn off this check for
out-of-conditional-sample constant value?

-- Stas Kolenikov, PhD, PStat (ASA, SSC)
-- Senior Survey Statistician, Abt SRBI
-- Opinions stated in this email are mine only, and do not reflect the
position of my employer
-- http://stas.kolenikov.name
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index