[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Re: When to impute - and an alternative

From   David Airey <[email protected]>
To   [email protected]
Subject   Re: st: Re: When to impute - and an alternative
Date   Mon, 3 Dec 2007 08:43:54 -0600


lmlist is a function to run a linear model on each group defined by the variable after the pipe "|". Several ways of doing this with Stata and saving the results, e.g., statsby.

There is a similar function in nlme for nonlinear least squares, nlslist, which does the same, and is useful to figure out which parameters show variation that could be fitted by a random effect when nlme is used.

The rest seems to be extracting the coefficients and standard errors from the results. "Sumary" seems misspelled.


On Dec 3, 2007, at 5:35 AM, Paul Seed wrote:

- --- David Airey <[email protected]> wrote:
> I have trouble understanding the translation of these three missing
> situations into when it is useful to impute.

The three situations are MCAR (missing completely at random)
MAR (missing at random) and NMAR (non-missing at random).
Analysis of complete data only can be biassed for MAR & NMAR.
Imputation is unnecessary with MCAR.

Here's a very practical approach :

1) Build a regression model to predict who is likely to go missing,
using the predictors you would use multiple imputation?
Is it reasonably powerful?

If not, there is no point in imputing. Your data is probably MCAR.
It is not MAR. Imputing will not help.

2) Calculate a prediction score from the logistic regression
Now, compare this score with the (non-missing) outcomes.
If there is no relationship, there is no correctable bias.

If you data passes tests 1) and 2), imputation is probably called for,
as MAR is a possibility. However, NMAR remains an issue.

3) Consider if there could be an unobserved process
causing people with extreme values of the outcome to go missing.
If you (and you non-statistical collaborators) judge this to be this
is implausible, your data is probably not NMAR.

Either way you should mention the possibility of NMAR
and the size & direction of any likely bias caused in the
discussion section of the paper.

A very interesting new paper on this subject is
Diggle, Fairwell & Henderson
Analysis of longitudinal data with dropout: objectives, assumptions and a proposal.
Appl. Statis (2007) 56 (5) 499-550 (with discussion).

As the title implies, it contains a new method,
based on martingale assumptions and difference
scores. Their method is unbiassed under
MAR and under certain version of NMAR
(when the martingale assumptions are valid) &
is therefore superior to multiple imputation.

They claim the method is very easy to implement using
standard software, and they give 4 lines of S-PLUS:

fit <- lmList(PANSS~ tesat|time, data = schizophrenioa, pool=F)
apply(coef(fit),2, cumsum)
SEs <- sumary(fit)$coef[,"Std. Error",]

If anyone is familiar with R or S PLUS, and in particular with the lmList
command from the -nlme- package (Pinhero & Bates 2000,
"Mixed effect models in S and S PLUS", NY, Springer),
and could translate these 4 lines into Stata, they would be doing
a great favour to the Stata community.

Paul T Seed MSc CStat
Senior Lecturer in Medical Statistics

King's College London
Division of Reproduction and Endocrinology

St Thomas' Hospital,
Lambeth Palace Road,
London SE1 7EH

tel (+44) (0) 20 7188 3642
fax (+44) (0) 20 7620 1227

* For searches and help try:
David C. Airey, Ph.D.
Pharmacology Research Assistant Professor
Center for Human Genetics Research Member

Department of Pharmacology
School of Medicine
Vanderbilt University
Rm 8158A Bldg MR3
465 21st Avenue South
Nashville, TN 37232-8548

TEL   (615) 936-1510
FAX   (615) 936-3747
EMAIL [email protected]

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index