In its documentation for the -impute- command, the Stata 8 reference manual
states that "[imputation] is not the only method for coping with missing
data, but it is often much better than deleting cases with any missing
data, which is the default."
I'm curious how much agreement there is with that statement. If your
choices were limited to (a) listwise (aka casewise) deletion of missing
data, or (b) filling in imputed values for the missing data (e.g. the
overall mean, a subgroup mean, or a regression estimate of the missing value),
are their indeed situations in which (b) is "often much better?" Listwise
deletion, of course, causes you to lose cases; but imputation can lead to
misleading standard errors and test statistics because techniques don't
take into account the uncertainty about the values of the missing data. In
his monograph on Missing Data Allison seems to prefer listwise deletion
over conventional imputation procedures but I'm not sure what the consensus
is on this.
I realize that there are advanced methods that may be better than (a) or
(b); but if your choice is only between (a) and (b), is it really the case
that (b) is often much better (or did the manual writers just make that up)?
Also, just curious if people would agree with me that, rightly or wrongly,
listwise deletion is the most common strategy for dealing with missing
data? It seems like many of the more advanced techniques are not well
understood and/or are not well implemented in statistical software. For
example, Stata has some user-written routines (e.g. -hotdeck-) but the
built-in support for handling missing data seems pretty limited.