Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Backfill Missing Values


From   "Rivera, Paul A." <riveraecon@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Backfill Missing Values
Date   Thu, 16 Apr 2009 16:46:00 -0700

Hello!

I have a panel dataset that looks something like this:

year    gdpReal   gdpNom   gdpdefl    PPI   dPPI
1900    126       .        .          25
1901    132       .        .          27    .08
1902    138       .        .          29    .074
1903    142       .        .          31    .069
1904    147       41.16    28         32    .032
1905    150       48       32         34    .063
1906    151       49.83    33         35    .029

The variable I need is gdpNom. Normally, one would obtain this as:
   gdpNom = gdpReal * (gdpdefl/100);
however, this is clearly not possible since gdpdefl is missing anywhere gdpNom is missing.

So, I want to estimate gdpdefl using dPPI, something like this:

  gen    gdpdeflFILL = F.gdpdefl/(1+F.dPPI) if gdpdefl==.
  gen     gdpdeflEST = gdpdefl
  replace gdpdeflEST = gdpdeflFILL if gdpdeflEST==.

This would allow me to estimate gdpNom:
  gen     gdpNomEST = gdpnom
  replace gdpNomEST = gdpReal * (gdpdeflEST/100) if gdpNomEst==.

BUT, my brilliant plan falls apart because the forward lag operator (F) cannot cascade the same way that the backward lag operator (L) can, so I only, for example, get an estimate for 1903.

I feel like there must be an easy solution to this, but I am stuck. My panel has 57 groups and 150 time periods, so I'd much rather not do this by hand.

Any help would be much appreciated.

Thanks,
Paul Rivera
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index