st: impute command for missing data

From   Richard Williams
To   [email protected]
Subject   st: impute command for missing data
Date   Wed, 05 Jan 2005 09:49:08 -0500

In its documentation for the -impute- command, the Stata 8 reference manual states that "[imputation] is not the only method for coping with missing data, but it is often much better than deleting cases with any missing data, which is the default."

I'm curious how much agreement there is with that statement. If your choices were limited to (a) listwise (aka casewise) deletion of missing data, or (b) filling in imputed values for the missing data (e.g. the overall mean, a subgroup mean, or a regression estimate of the missing value),

are their indeed situations in which (b) is "often much better?" Listwise deletion, of course, causes you to lose cases; but imputation can lead to misleading standard errors and test statistics because techniques don't take into account the uncertainty about the values of the missing data. In his monograph on Missing Data Allison seems to prefer listwise deletion over conventional imputation procedures but I'm not sure what the consensus is on this.

I realize that there are advanced methods that may be better than (a) or (b); but if your choice is only between (a) and (b), is it really the case that (b) is often much better (or did the manual writers just make that up)?

Also, just curious if people would agree with me that, rightly or wrongly, listwise deletion is the most common strategy for dealing with missing data? It seems like many of the more advanced techniques are not well understood and/or are not well implemented in statistical software. For example, Stata has some user-written routines (e.g. -hotdeck-) but the built-in support for handling missing data seems pretty limited.

