Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Backfill Missing Values


From   "Rivera, Paul A." <riveraecon@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Backfill Missing Values
Date   Fri, 17 Apr 2009 07:39:20 -0700

Thanks to all for the help!

I did read ahead of time the reverse time bit in the FAQ, but when I used the gsort command, I got a "Data Not Sorted" error. But, the [gen neg = -year] worked great.

Thanks again,
Paul

Nick Cox wrote:
David Kantor gave another good answer to the same effect. As a footnote: The whole area, including the reversing time trick, is covered in an FAQ
FAQ     . . . . . . . . . . . . . . . . . . . . . . . Replacing missing values
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        2/03    How can I replace missing values with previous or
                following nonmissing values?
                http://www.stata.com/support/faqs/data/missing.html

-search missing- would have pointed to this. Nick n.j.cox@durham.ac.uk
Austin Nichols

Something like this?

clear
input year gdpReal gdpNom gdpdefl PPI dPPI
1900 126 . . 25 .
1901 132 . . 27 .08
1902 138 . . 29 .074
1903 142 . . 31 .069
1904 147 41.16 28 32 .032
1905 150 48 32 34 .063
1906 151 49.83 33 35 .029
end
g id=1
g neg=-year
tsset id neg
gen g=gdpdefl
replace g=l.g/(1+l.dPPI) if mi(g)
g gdpnom=gdpReal*(g/100)
tsset id year
l, noo sep(0)

On Thu, Apr 16, 2009 at 7:46 PM, Rivera, Paul A. <riveraecon@gmail.com> wrote:

I have a panel dataset that looks something like this:

year    gdpReal   gdpNom   gdpdefl    PPI   dPPI
1900    126       .        .          25
1901    132       .        .          27    .08
1902    138       .        .          29    .074
1903    142       .        .          31    .069
1904    147       41.16    28         32    .032
1905    150       48       32         34    .063
1906    151       49.83    33         35    .029

The variable I need is gdpNom. Normally, one would obtain this as:
  gdpNom = gdpReal * (gdpdefl/100);
 however, this is clearly not possible since gdpdefl is missing anywhere
gdpNom is missing.

So, I want to estimate gdpdefl using dPPI, something like this:

 gen    gdpdeflFILL = F.gdpdefl/(1+F.dPPI) if gdpdefl==.
 gen     gdpdeflEST = gdpdefl
 replace gdpdeflEST = gdpdeflFILL if gdpdeflEST==.

This would allow me to estimate gdpNom:
 gen     gdpNomEST = gdpnom
 replace gdpNomEST = gdpReal * (gdpdeflEST/100) if gdpNomEst==.

BUT, my brilliant plan falls apart because the forward lag operator (F)
cannot cascade the same way that the backward lag operator (L) can, so I
only, for example, get an estimate for 1903.

I feel like there must be an easy solution to this, but I am stuck. My panel
has 57 groups and 150 time periods, so I'd much rather not do this by hand.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index