Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: R: Imputation of missing data in an unbalanced panel using ICE


From   "Carlo Lazzaro" <carlo.lazzaro@tiscalinet.it>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: R: Imputation of missing data in an unbalanced panel using ICE
Date   Fri, 25 Oct 2013 17:17:13 +0200

James asked: 
"Also, how wrong is to use only the first imputation (M=1) to run the model,
instead of using all the imputations?".

The approach James proposes would seem to rule out the between variance
component (that is, the variance between different M=n datasets generated
via MI), which is a qualifying features of MI.

Kind regards,
Carlo

-----Messaggio originale-----
Da: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di James Bernard
Inviato: venerdì 25 ottobre 2013 13:47
A: statalist@hsphsun2.harvard.edu
Oggetto: st: Imputation of missing data in an unbalanced panel using ICE

Hi all,

I have been using imputation techniques. Stata offers a wide range of
commands to conduct imputation.

I have a unbalanced panel data. Several variables have missing values.
To benefit from the fact that the available observation of a variable at
certain times can help estimate the missing values at other times, I changed
the format of my data from long to wide and used ICE using the instruction
from this site:
http://www.ats.ucla.edu/stat/stata/faq/mi_longitudinal.htm

These instructions work for a balanced panel data set where all firms are
supposed to have values in all years.

But, imagine that one firm has to have values from 2000-2003, and another
from 1998-2003. And, suppose we have a variable (X) for which some
observations across these two firms are missing

Firm       Year        X
---------    ---------    -------
A           2000       .
A           2001      10
A           2002       6
A           2003       .

B           1998       3
B           1999       .
B           2000        .
B           2001        4
B           2002        6
B           2003        2

Reshaping the data from long to wide would lead to: creation of 6 new
varibale named "X1998", "X1999",......"X2003".... and values of X1998 and
X1999 will be missing for firm A

And running the ICE, it would predict values for X1998 and X1999 for both
firm A and B.

The next step is to get the data into long form and run the -mi- commands to
make the estimation which use Rubin rules for combining the data on the m
imputations made.

One may argue that I can let the ICE predict the values of X1998 and
X1999 for firm A. Reshape the data into long format and remove the values of
X from firm A in 1998 and in 1999, because firm A is not supposed to have
values in 1998 and 1999.

My question is: Does asking ICE to predict values of X1998 and X1999 for
firm A affect the way it predicts the value of X2000 (which is the main
observation we have to impute)?

Does the technique I used make sense?

Also, how wrong is to use only the first imputation (M=1) to run the model,
instead of using all the imputations?

Thanks,
James
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index