Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Efficient handling of missing data


From   Michael Ingre <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: RE: Efficient handling of missing data
Date   Thu, 22 Jan 2004 11:55:41 +0100

Nick Cox

> I don't think it is only you.

That's what I thought also.

> all concerned, StataCorp included, would
> agree that there is a lot of interesting,
> important stuff in this field not implemented in Stata.

Couldn't agree more. Stata is a very capable package but in this area it
seems like the competition is ahead.

> A Stata friend, not a Statalist member, has
> just added to the repertoire of Stata programs
> for missing data.
> 
> I'll not mention a name just in case that person
> prefers not to go public yet, but I'll
> ask privately whether the time is ripe
> for an SSC release.

I'm very interested in your friends work. And I appreciate you for taking
your time to investigate it.

> It's not obvious, however, that there is
> one clear leader among techniques that
> really is top of the should-be-implemented
> list.

There are competing techniques out there but I would suggest Multiple
Imputation (MI) as a particular interesting technology. This is not a
totally homogenous technology (there are some variations) but the most
powerful procedures make use of Markov Chain Monte Carlo (MCMC) simulations.

The main advantage is that (after imputation) standard techniques could be
used to analyze data. There is no need to develop special procedures for
analyzing data with missing values. The old ones work just fine. There is a
need however, to combine the results from analyzes on all MI-datasets, but
for this purpose there are already tools available in Stata (SJ3-3 st0042)
that should work with most estimation commands.

I'm thinking, maybe StataCorp has already recognized this as a future
(current) area of development. Or maybe someone has been working on a
program for MI. I remember a statement from William Gould earlier:

> ... we are adoptiong
> a policy of releasing some new features as we go, for free ...

because

> ... We were seeing cases where
> unknowing users were spending time implementing features which we had already
> implemented, and yet we were sworn to secrecy. ...

It makes me wonder about my (and possible others) future strategy of
handling missing values in data and Stata. Should I stay or should I go?


Michael


Some introductory texts and links for those of you that are interested:

Schafer J. L. (1999) Multiple imputation: a primer. Statistical Methods in
Medical Research 1999; 8: 3-15

SAS technical report:
http://support.sas.com/rnd/app/papers/multipleimputation.pdf

Schafer - technical reports on MI: http://www.stat.psu.edu/~jls/
Schafer - FAQ: http://www.stat.psu.edu/~jls/mifaq.html

And these standard texts that I haven't had time to read (yet):

Rubin, D. B. 1987. Multiple Imputation for Nonresponse in Surveys.  New
York: John Wiley & Sons.

Schafer, J. L. 1997. Analysis of Incomplete Multivariate Data.  London:
Chapman & Hall.



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index