Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Multiple imputation with panel data

From   Veronica Galassi <[email protected]>
To   <[email protected]>
Subject   Re: st: Multiple imputation with panel data
Date   Fri, 06 Jul 2012 10:33:48 +0100

Dear all,

You cannot even imagine how much I appreciate your advice!!! This is my
very first quantitative research using Stata and my dataset is not exactly
how a researcher would expect it to be. 
The example reported by Lance describes perfectly my situation.
But apart from this variable which has got missing information for one
entire year, I have got also many other data missing at random for other
So maybe I should first try to estimate the coefficients of x2003 = b_0 +
b_1*y2003 and then use them to predict x2007 as Oliver was suggesting.
But isn't this way of proceeding the same than extrapolating values for
x2007 exploiting the linear relationship between x and y?
Because maybe I could simply use extrapolation. 

I have also read that in order to perform something which is closer to
what Stata does when performing multiple imputation, I could compute the
variance of the residuals obtained from the first regression and then
predict x2007. Randomly drawing m numbers I could multiply each of these m
numbers by the standard deviation of the residuals and then adding this
value up to the predicted values of x2007 I would be able to obtain m
imputations from my original dataset. Using Rubin's rule I would then
obtain one single value from my m imputations. Do you think this makes

Once I have done that, I should try again to perform multiple imputation
in Stata to impute the rest of the dataset following what Wes was



So On Fri, 06 Jul 2012 01:42:50 +0200, Oliver Jones
<[email protected]> wrote:
> Hi Veronica,
> if the little data example Lance gave is describing your situation, then
> agree with his 
> conclusion that you can not impute the missing values.
> To be precise there is a way to get reasonable values for x2007 but the
> result will not help 
> in explaining y2007! The way I'm talking about is to estimate x2003 =
> + b_1*y2003 then 
> assume that the parameters didn't change over time and calculate b_0 +
> b_1*y2007 which is your 
> estimate for x2007...
> But as Lence said, others might come up with something more helpful...
> Best Oliver
> Am 06.07.2012 00:42, schrieb Lance Erickson:
>> Veronica,
>> Perhaps I'm misunderstanding your problem, but if you have wide format
>> data and there are no values for any of the observations in 2007 for
>> of the variables in the imputation model, with data like...
>> Id	x2003	x2007	y2003	y2007
>> 1	5	.	8	9
>> 2	4	.	3	3
>> 3	3	.	8	5
>> then I don't think that multiple imputation is an option for you. My
>> understanding of MI is more intuitive than technical but I believe that
>> to impute values for a given variable, there has to be some information
>> about how the variable is distributed. But if, in the example above,
>> x2007 is all missing then there is no existing information that can
>> inform the estimation of missing values. In other words, MI can't
>> data that you don't have. (Even though I think people sometimes seem to
>> prefer listwise deletion to MI because it feels like that's exactly
>> MI is doing.) It can only give you estimates of what the data might be
>> based on existing values and their relationship of those existing
>> to other variables in the imputation model. There are many others on
>> Statalist that are substantially better credentialed than I to answer
>> your question but that's my take.
>> Best,
>> Lance
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Veronica
>> Galassi
>> Sent: Thursday, July 05, 2012 1:00 PM
>> To: [email protected]
>> Subject: Re: st: Multiple imputation with panel data
>> Hi Oliver,
>> Thank you for your kind reply!
>> I am not quite sure whether I got your hint or not...maybe my
>> explanation was just not clear enough, sorry about that!!!
>> I think my case is slightly different from what you were describing
>> because I am not interested in the missing data between 2003 and 2007.
>> In that case, as you said, I would just fit a line.
>> What I am trying to impute are the missing data inside the year 2003
>> 2007 respectively.
>> And things are made even more complicated by the fact that for the main
>> explanatory variable of my model I have got only observations for the
>> year
>> 2003 but not for 2007. That's why I was thinking about multiple
>> imputation!
>> But maybe you are right, I just have too many missing data.
>> Best,
>> Veronica
>> On Thu, 05 Jul 2012 19:27:47 +0200, Oliver
>> Jones<[email protected]>  wrote:
>>> Hi Veronica,
>>> I have just one hint: Maybe two observations are just not enough to do
>> the
>>> imputation.
>>> Just think about it, I give a number, e.g. 3.145 percent, for 2003 and
>>> a number, e.g 5.0 percent, for 2007 and ask you what are the values
>>> for the years in
>> between.
>>> Can you imagine some fancy method two figure it out?
>>> I would suspect, under the assumption you don't have any other
>>> information, that there is no best solution. Maybe you could just draw
>>> a line between the years.
>>> Best
>>> Oliver
>>> *
>>> *   For searches and help try:
>>> *
>>> *
>>> *
>> --
>> MSc Development Economics
>> University of Sussex
>> Mobile: +44 78 5563 0276
>> 14 Auckland Drive,
>> BN2 4JS, Brighton, UK
>> E-mail: [email protected]
>> *
>> *   For searches and help try:
>> *
>> *
>> *
>> *
>> *   For searches and help try:
>> *
>> *
>> *

MSc Development Economics
University of Sussex 
Mobile: +44 78 5563 0276

14 Auckland Drive,
BN2 4JS, Brighton, UK

E-mail: [email protected]
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index