Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

R: st: R: Imputation vs substitution with mean

From   "Carlo Lazzaro" <>
To   <>
Subject   R: st: R: Imputation vs substitution with mean
Date   Sun, 20 Oct 2013 17:17:33 +0200

I thank Clyde for chiming in this thread.
I share the main meaning of almost all his clarifications, and I would add
the relevance of reducing the likelihood to have missing values when a study
is planned.
The importance of the underlying missingness mechanism and pattern cannot be
underestimated, too.
In general, I would dislike LOCF (and I have the same feeling about next
observation carried backward - NOCB) because the imputed value may well be
an outlier; however, bot these approaches can play a role in the "anything
goes" setting of sensitivity analysis when data are suspected to be missing
not at random. 

However, James had a general question so my reply was general as well; that'
s why, among other things, I have pointed him to -[MI] intro substantive -
in Stata .pdf manual.

Best regards,

-----Messaggio originale-----
[] Per conto di Clyde Schechter
Inviato: sabato 19 ottobre 2013 17:56
Oggetto: Re: st: R: Imputation vs substitution with mean

Carlo Lazzaro has advised James Bernard to use MI rather than substituting
means, and characterized last observation carried forward
(LOCF) as "don't do it," and MI as "the way to go." I think these
recommendations need some qualification.

I agree that MI has better statistical properties than mean substitution and
is preferable in most circumstances.  But there are exceptions.  There are
measurement scales that were developed with ipsative mean imputation of item
non-response as part of their design.
 In the modern research setting, one would probably deal with item
non-response through MI instead, but doing so would, in effect, be changing
the design of the measurement and would lose the claim to rely on any prior
validation studies.

I disagree strongly that LOCF is a "don't do it," and that MI is "the way to
go."  If the goal is to get unbiased parameter estimates, then, certainly
LOCF is off the table.  But MI only achieves this goal when the data are
missing at random.  And, unfortunately, missingness at random is an
assumption that can never be tested in the data.  When one contemplates the
mechanisms leading to missing data, in some studies, (and, in my experience,
this is common) missingness at random may be breathtakingly incredible.
When confronted with data missing not at random, it is not obvious what MI
accomplishes unless you can somehow base it on a valid model of the
missingness mechanisms.  In this setting, unbiased parameter estimates may
be unobtainable by any means, and LOCF may be one reasonable part of a
sensitivity analysis that seeks to find believable upper and lower bounds on
the parameter estimates.

Clyde Schechter
Department of Family & Social Medicine
Albert Einstein College of Medicine
Bronx, NY, USA
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index