Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Multiple imputation with panel data

From	Veronica Galassi <[email protected]>
To	<[email protected]>
Subject	Re: st: Multiple imputation with panel data
Date	Fri, 06 Jul 2012 10:33:48 +0100

Dear all,

You cannot even imagine how much I appreciate your advice!!! This is my
very first quantitative research using Stata and my dataset is not exactly
how a researcher would expect it to be. 
 
The example reported by Lance describes perfectly my situation.
But apart from this variable which has got missing information for one
entire year, I have got also many other data missing at random for other
variables.
So maybe I should first try to estimate the coefficients of x2003 = b_0 +
b_1*y2003 and then use them to predict x2007 as Oliver was suggesting.
But isn't this way of proceeding the same than extrapolating values for
x2007 exploiting the linear relationship between x and y?
Because maybe I could simply use extrapolation. 

I have also read that in order to perform something which is closer to
what Stata does when performing multiple imputation, I could compute the
variance of the residuals obtained from the first regression and then
predict x2007. Randomly drawing m numbers I could multiply each of these m
numbers by the standard deviation of the residuals and then adding this
value up to the predicted values of x2007 I would be able to obtain m
imputations from my original dataset. Using Rubin's rule I would then
obtain one single value from my m imputations. Do you think this makes
sense?

Once I have done that, I should try again to perform multiple imputation
in Stata to impute the rest of the dataset following what Wes was
suggesting.

Cheers,

Veronica


So On Fri, 06 Jul 2012 01:42:50 +0200, Oliver Jones
<[email protected]> wrote:
> Hi Veronica,
> 
> if the little data example Lance gave is describing your situation, then
I
> agree with his 
> conclusion that you can not impute the missing values.
> 
> To be precise there is a way to get reasonable values for x2007 but the
> result will not help 
> in explaining y2007! The way I'm talking about is to estimate x2003 =
b_0
> + b_1*y2003 then 
> assume that the parameters didn't change over time and calculate b_0 +
> b_1*y2007 which is your 
> estimate for x2007...
> 
> But as Lence said, others might come up with something more helpful...
> 
> Best Oliver
> 
> Am 06.07.2012 00:42, schrieb Lance Erickson:
>> Veronica,
>>
>> Perhaps I'm misunderstanding your problem, but if you have wide format
>> data and there are no values for any of the observations in 2007 for
one
>> of the variables in the imputation model, with data like...
>>
>> Id	x2003	x2007	y2003	y2007
>> 1	5	.	8	9
>> 2	4	.	3	3
>> 3	3	.	8	5
>>
>> then I don't think that multiple imputation is an option for you. My
>> understanding of MI is more intuitive than technical but I believe that
>> to impute values for a given variable, there has to be some information
>> about how the variable is distributed. But if, in the example above,
>> x2007 is all missing then there is no existing information that can
>> inform the estimation of missing values. In other words, MI can't
create
>> data that you don't have. (Even though I think people sometimes seem to
>> prefer listwise deletion to MI because it feels like that's exactly
what
>> MI is doing.) It can only give you estimates of what the data might be
>> based on existing values and their relationship of those existing
values
>> to other variables in the imputation model. There are many others on
>> Statalist that are substantially better credentialed than I to answer
>> your question but that's my take.
>>
>> Best,
>> Lance
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Veronica
>> Galassi
>> Sent: Thursday, July 05, 2012 1:00 PM
>> To: [email protected]
>> Subject: Re: st: Multiple imputation with panel data
>>
>> Hi Oliver,
>>
>> Thank you for your kind reply!
>>
>> I am not quite sure whether I got your hint or not...maybe my
>> explanation was just not clear enough, sorry about that!!!
>> I think my case is slightly different from what you were describing
>> because I am not interested in the missing data between 2003 and 2007.
>> In that case, as you said, I would just fit a line.
>> What I am trying to impute are the missing data inside the year 2003
and
>> 2007 respectively.
>> And things are made even more complicated by the fact that for the main
>> explanatory variable of my model I have got only observations for the
>> year
>> 2003 but not for 2007. That's why I was thinking about multiple
>> imputation!
>> But maybe you are right, I just have too many missing data.
>>
>> Best,
>>
>> Veronica
>>
>>
>>
>> On Thu, 05 Jul 2012 19:27:47 +0200, Oliver
>> Jones<[email protected]>  wrote:
>>> Hi Veronica,
>>> I have just one hint: Maybe two observations are just not enough to do
>> the
>>> imputation.
>>> Just think about it, I give a number, e.g. 3.145 percent, for 2003 and
>>> a number, e.g 5.0 percent, for 2007 and ask you what are the values
>>> for the years in
>> between.
>>> Can you imagine some fancy method two figure it out?
>>> I would suspect, under the assumption you don't have any other
>>> information, that there is no best solution. Maybe you could just draw
>>> a line between the years.
>>>
>>> Best
>>> Oliver
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> --
>> VERONICA GALASSI
>> MSc Development Economics
>> University of Sussex
>> Mobile: +44 78 5563 0276
>>
>> 14 Auckland Drive,
>> BN2 4JS, Brighton, UK
>>
>> E-mail: [email protected]
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/

-- 
VERONICA GALASSI
MSc Development Economics
University of Sussex 
Mobile: +44 78 5563 0276

14 Auckland Drive,
BN2 4JS, Brighton, UK

E-mail: [email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Multiple imputation with panel data
  - From: Oliver Jones <[email protected]>

References:
- st: Multiple imputation with panel data
  - From: Oliver Jones <[email protected]>
- Re: st: Multiple imputation with panel data
  - From: Veronica Galassi <[email protected]>
- RE: st: Multiple imputation with panel data
  - From: Lance Erickson <[email protected]>
- Re: st: Multiple imputation with panel data
  - From: Oliver Jones <[email protected]>

Prev by Date: Re: st: R squared of OLS with dummy variables
Next by Date: Re: st: R squared of OLS with dummy variables
Previous by thread: Re: st: Multiple imputation with panel data
Next by thread: Re: st: Multiple imputation with panel data
Index(es):
- Date
- Thread