[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Rodrigo Alfaro" <ralfaro@bu.edu> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: Re: Multiple Imputation and other Missing Data |

Date |
Thu, 24 Nov 2005 00:09:03 -0500 |

Hi,

A simple paper about multiple imputation is available at http://gking.harvard.edu/files/abs/evil-abs.shtml, it is really easy to read. Also, in that page there is a free-software to use in the case of MAR with Bayesian technique.

Rodrigo

----- Original Message ----- From: "Groves" <Groves@tamu.edu>

To: "stata" <statalist@hsphsun2.harvard.edu>

Sent: Wednesday, November 23, 2005 8:53 PM

Subject: st: Multiple Imputation and other Missing Data

Dear list,

I'm hoping that one or more of you can take a minute or two to skim through

the following several paragraphs and provide some feedback to someone

marginally familiar with statistics in general (and missing data techniques

in particular) about whether these paragraphs make any sense and whether

they provide any indication about the knowledge level of their author.

A student wrote these paragraphs in an attempt to summarize the methods he

used to conduct a longitudinal analysis purportedly employing some type of

advanced missing data technique. These paragraphs were intended for

publication on a jointly authored manuscript, and thus my desire to make

certain that they are reasonably accurate.

From my read, however, I am unable to determine what type of missing data

technique the student is claiming to have used and more generally, whether

any of these explanations even make sense. Unfortunately, although I can

sense that something is wrong, I lack the experience to put my finger on

the exact problem. Succinctly, would you advise for or against placing

one's name as a co-author on a paper containing the following paragraphs

(even after a reasonable degree of copy-editing). Any feedback (on or

off-list) would be greatly appreciated. I've extracted the sections that

appear most relevant. The section entitled "Missing Data," however, is

nearly complete as provided by the student and was intended to be a full

description of relevant missing data issues and an explanation for the

exact techniques used in the present research.

Thanks in advance for your comments.

*************************************************************************

Data Collection

The data used in this study were collected starting in 19** from a

target population made up of the seventh graders enrolled in a random half

of all the junior high schools in the **** School District. These

adolescents were surveyed again in 19** , and in 19** .

The selection criteria needed for the subjects to be included in the

sample were that they provided data in both the Time 1 and Time 2 data

collection waves ... .

Missing Data

Ignorable missing data is usually a product of two types of

mechanisms, missing completely at random (MCAR) and missing at random

(MAR). Data is MCAR when a subject's nonresponse to a question is not

dependent on any other measured or observed variable related to the

subject, study, or the question itself. If a subject's nonresponse to a

given question is contingent on subject characteristics or a previous

response, but not dependent on the question itself then the data are

considered MAR (Rubin 1976, Enders and Bandalos 2001). It should be

evident that MCAR is the stronger assumption, because data that is MCAR is

also MAR.

Missing data in the variables reported here are assumed to be the less

restrictive MAR type, however given the nature of the variables it is

possible that subjects' responses might not even meet MAR

assumptions. There several methods for addressing missing data. Such

methods include theory based direct maximum likelihood or full information

likelihood (FIML), listwise and pairwise deletion and different forms of

multiple imputation. In general, the majority of recent research into the

efficiency of missing data methods has shown that direct maximum likelihood

techniques out perform all other methods (Enders 2001, Little and Rubin 1987).

One drawback of the direct ML method is that it assumes multivariate

normality similar to all ML estimation methods. However, little is known

about how these methods work in the presence of nonnormal data and/or

clustered data such as used for the study. If it follows other ML

estimation techniques, then most likely parameters will be increasingly

biased as the degree of nonnormality and clustering increase. One form of

imputation called the similar response pattern method has been implemented

in PRELIS 2, which is a preprocessor for the LISREL program (Joreskog and

Sorbom 1996).

The method attempts to impute real values from another case with

similar observed values by using a minimization routine based on a set of

matching variables. If the routine cannot find a case with complete data

using the matching variables then the missing value for that variable is

not imputed into the case and remains missing. A study by Brown (1994)

found that compared to listwise and pairwise deletion, mean imputation, and

hot-deck imputation, similar response pattern imputation produced the least

bias overall in regard to structural and measurement model

parameters. However, he did find that there was some positive bias in the

error estimates indicating that Type 1 error rates would be larger than

normal.

Although there is no statistical theory that would support this method

over direct missing data methods, the fact that it imputes values from

similar cases is attractive, because of the clustered nature of the second

generation data. If it is plausible that children from the same family

would have more similar responses to each other than to children from other

families then possibly imputing a value from a respondent's sibling does

have some validity. As suggested in the PRELIS manual a large number of

matching variables were used, including subject identification numbers,

that were not otherwise used to select the subjects or used in any of the

model estimations as moderators, indicators, or other variables.

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Netcourse Certification.***From:*ariley@stata.com (Alan Riley)

**st: Multiple Imputation and other Missing Data***From:*"Groves" <Groves@tamu.edu>

- Prev by Date:
**st: GEE and ANOVA** - Next by Date:
**st: ltable overlay graphs confidence intervals** - Previous by thread:
**st: radar graphics in stata** - Next by thread:
**Re: st: Multiple Imputation and other Missing Data** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |