Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: R: Imputation of missing data in an unbalanced panel using ICE


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: R: Imputation of missing data in an unbalanced panel using ICE
Date   Fri, 25 Oct 2013 16:32:30 +0100

There is a bundle of issues here.

Carlo touches on one, which is what multiple imputation does and does
not purport to provide.

Another is that the method being used here -reshape-s panel data to
wide, imputes and then -reshape-s back.

This really does raise the question of precisely what assumptions are
needed about variations in time to make that legitimate. There's no
white magic independently of whether tacit assumptions match the data
generating process. I've not thought this through either -- I don't do
this stuff -- but I want to send a Hang on there... signal of caution.

No-one seems interested any more in interpolation as a rough family of
methods of filling in gaps in time series. Rather, it is a smooth
method of filling in gaps and raises questions of its own too.


Nick
njcoxstata@gmail.com


On 25 October 2013 16:17, Carlo Lazzaro <carlo.lazzaro@tiscalinet.it> wrote:
> James asked:
> "Also, how wrong is to use only the first imputation (M=1) to run the model,
> instead of using all the imputations?".
>
> The approach James proposes would seem to rule out the between variance
> component (that is, the variance between different M=n datasets generated
> via MI), which is a qualifying features of MI.
>
> Kind regards,
> Carlo
>
> -----Messaggio originale-----
> Da: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di James Bernard
> Inviato: venerdě 25 ottobre 2013 13:47
> A: statalist@hsphsun2.harvard.edu
> Oggetto: st: Imputation of missing data in an unbalanced panel using ICE
>
> Hi all,
>
> I have been using imputation techniques. Stata offers a wide range of
> commands to conduct imputation.
>
> I have a unbalanced panel data. Several variables have missing values.
> To benefit from the fact that the available observation of a variable at
> certain times can help estimate the missing values at other times, I changed
> the format of my data from long to wide and used ICE using the instruction
> from this site:
> http://www.ats.ucla.edu/stat/stata/faq/mi_longitudinal.htm
>
> These instructions work for a balanced panel data set where all firms are
> supposed to have values in all years.
>
> But, imagine that one firm has to have values from 2000-2003, and another
> from 1998-2003. And, suppose we have a variable (X) for which some
> observations across these two firms are missing
>
> Firm       Year        X
> ---------    ---------    -------
> A           2000       .
> A           2001      10
> A           2002       6
> A           2003       .
>
> B           1998       3
> B           1999       .
> B           2000        .
> B           2001        4
> B           2002        6
> B           2003        2
>
> Reshaping the data from long to wide would lead to: creation of 6 new
> varibale named "X1998", "X1999",......"X2003".... and values of X1998 and
> X1999 will be missing for firm A
>
> And running the ICE, it would predict values for X1998 and X1999 for both
> firm A and B.
>
> The next step is to get the data into long form and run the -mi- commands to
> make the estimation which use Rubin rules for combining the data on the m
> imputations made.
>
> One may argue that I can let the ICE predict the values of X1998 and
> X1999 for firm A. Reshape the data into long format and remove the values of
> X from firm A in 1998 and in 1999, because firm A is not supposed to have
> values in 1998 and 1999.
>
> My question is: Does asking ICE to predict values of X1998 and X1999 for
> firm A affect the way it predicts the value of X2000 (which is the main
> observation we have to impute)?
>
> Does the technique I used make sense?
>
> Also, how wrong is to use only the first imputation (M=1) to run the model,
> instead of using all the imputations?
>
> Thanks,
> James
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index