Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Multiple Imputation on Panel Data: all variables have missing data, and the panels are expanding

From   Maarten buis <>
To   stata list <>
Subject   Re: st: Multiple Imputation on Panel Data: all variables have missing data, and the panels are expanding
Date   Tue, 7 Sep 2010 08:57:10 +0000 (GMT)

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen

--- On Tue, 7/9/10, Jennie Day wrote:
> I am trying to use STATA 11's mi commands to do multiple imputations on
> a panel dataset where all variables have some missing values (save the
> unique ID numbers and the time variable). The missing rate varies from
> 1% to 10%.
> I would like to ask you what options STATA offers me to move forward. 
> I understand the issues with this level of missingness.  Listwise
> deletion is not an option because 1) the  missingness is very likely
> MNAR, 

The terminology used in multiple imputation is confusing. So to make sure
we are talking about the same thing: a missing data pattern can CMAR, MAR, 
or NMAR: 

Completely Missing At Random means that missingness is completely

Missing At Random means that probability of being missing may depend on
all observed values, but not on any of the unobserved missing values,

Not Missing At Random, means that the probability of missingness depends
also on the unobserved missing values.

Then multiple imputation can't help you, it can only deal with MAR and 
CMAR. If you believe that your missing data pattern is NMAR, then 
empirical research is pretty much impossible. How can you do empirical
research on stuff that you haven't seen?. You can do things like 
variations on -heckman-, but I would view the results more as a 
theoretical simulation/scenario-study than empirical research.

> and 2) it would make the sample size too small to be useful. 

That suggests to me that you are using too many variables in your model.
With 1-10% missingness per variable, you can add a few variables without
loosing to big a proportion of your data.

> STATA doesn't offer pairwise deletion, so I'd have to code this up
> myself.  And plus pairwise deletion - as you said in another thread -
> pairwise deletion generates worse biases than listwise according
> to (Allison 2002).

Correct, that is not the solution.

> So here's my situation: I have a rich panel dataset from a developing
> country that could yield some interesting policy results.  It is the
> unfortunate consequence of working on data from a developing country,
> that the data has missing values. 

That is true everywhere.

> I've tried the mi functions using mvn (the multivariate normal
> estimation option), and I get error messages like the ones copied below. > I've read in STATA's MI manual that doing univariate estimations for
> multiple imputations is incorrect procedure if the results are not used
> in independent analyses.  I understand this, but it may be my only
> option.  You said in your last email that I might need to code up my
> own imputation method.

First, you need to be very very very well informed about how multiple
imputation works, why it works, and the statistical theory underlying it,
before you should attempt to do this. So recomend that you should not
do this. 

Second, if you want to invent your own imputation command, and you have
the skills and knowledge, then I would start with the model of interest
and work backwards, rather than start with inventing an imputation model.

Third, the algorithm you proposed is somewhat similar to the algorithm
used by -ice- (see: -findit ice-), so if you really want to go that route,
I would start there.

However, I would not go this route. Instead I would just start with
estimating my model of interest while ignoring the missing values. 
Afterwards you can try to do "other stuff", but don't expect too much
from that, that is, if it leads to different conclussions expect that 
to be the result of an error in the way you applied the "other stuff".
Given this, you could just as well not do the "other stuff", as this 
way you are just going to confirm your earlier conclusions. 
Corrections for missing values could legitemately lead to different
conclusions, but in practice the models are just too fragile to believe 
such changes.

Hope this helps,


*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index