Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: impute command for missing data

From   Leonelo Bautista <[email protected]>
To   [email protected]
Subject   st: RE: impute command for missing data
Date   Wed, 05 Jan 2005 11:44:05 -0600

Aside from statistical issues, list wise deletion could lead to selection
bias because units or subjects with missing data may be systematically
different from those without missing data. Naturally, what's best also
depends on specific circumstances. On the other hand, the nature of the
statistical problems with single imputation is very well understood
(artificially increased precision), but the direction and magnitude of the
potential bias resulting from case-wise deletion may be difficult to judge.


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Richard Williams
Sent: Wednesday, January 05, 2005 8:49 AM
To: [email protected]
Subject: st: impute command for missing data

In its documentation for the -impute- command, the Stata 8 reference manual 
states that "[imputation] is not the only method for coping with missing 
data, but it is often much better than deleting cases with any missing 
data, which is the default."

I'm curious how much agreement there is with that statement.  If your 
choices were limited to (a) listwise (aka casewise) deletion of missing 
data, or (b) filling in imputed values for the missing data (e.g. the 
overall mean, a subgroup mean, or a regression estimate of the missing

are their indeed  situations in which (b) is "often much better?" Listwise 
deletion, of course, causes you to lose cases; but imputation can lead to 
misleading standard errors and test statistics because techniques don't 
take into account the uncertainty about the values of the missing data.  In 
his monograph on Missing Data Allison seems to prefer listwise deletion 
over conventional imputation procedures but I'm not sure what the consensus 
is on this.

I realize that there are advanced methods that may be better than (a) or 
(b); but if your choice is only between (a) and (b), is it really the case 
that (b) is often much better (or did the manual writers just make that up)?

Also, just curious if people would agree with me that, rightly or wrongly, 
listwise deletion is the most common strategy for dealing with missing 
data?  It seems like many of the more advanced techniques are not well 
understood and/or are not well implemented in statistical software.  For 
example, Stata has some user-written routines (e.g. -hotdeck-) but the 
built-in support for handling missing data seems pretty limited.

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index