Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Out-of-sample prediction: Time dummy problem


From   Ulrich Kohler <kohler@wz-berlin.de>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Out-of-sample prediction: Time dummy problem
Date   Wed, 14 Sep 2005 11:57:50 +0200

Niko Wrede wrote:
> I estimate a model using an in-sample-dataset (e.g. 1989-1999)and the
> xtabond2 command with time dummies. Then I want to predict and calculate
> residuals using an out-of-sample-dataset (e.g. 2000-2004) for another time
> period.
>
> My problem: Using predict and the out-of-sample-dataset, STATA complains,
> that some variables are missing in the dataset, since it cannot find the
> time dummies.
>
> Question: How can I predict using only the estimates of the main model
> variables without the estimates for the time dummies?
>
> My idea was:
> 1. Estimate:
> Use insample.dta
> Xi: xtabond2 depvar var1 var2 i.year (endogvar, lag(2 .)) ivstyle(var1 var2
> i.year) rob twostep
>
> 2. Predict:
> Use outsample.dta
> ... and then some kind of adjust or xpredict ... But it is not working so
> far and I am not sure, if this is the right way.

Niko,

you have generated dummy-variables for year on the fly by prefixing -xtabond2- 
with -xi:-. Therefore you need to reproduce this step also in the outsample: 

. use insample
. xi i.year
. xtabond2 depvar var1 var2 I* ...
. drop _all
. use outsample
. xi i.year
. predict [whatever]

Note that you do not necessarily need to break the model estimation and the 
generating of the time dummies for the insample-step. I only did this to show 
the logic.

Your question implies that you don't want to use the year dummies in the 
prediction. I guess that this means to "fix" the year to the reference-year. 
In this case you might want to -replace- all year dummies to zero before 
prediction, i.e.

. foreach var of varlist I* {
. 	replace `var' = 0
. } 
. predict [whatever]

I am not familiar to  -xtabond2-, but I suspect that it will also generates 
variables holding lags of the endogenous variables. If so, you will also need 
to generate these variables by hand in the outsample, i.e. something along 
the line of 

. gen varname = l2.varname

hope that helps
Uli


-- 
kohler@wz-berlin.de
+49 (030) 25491-361
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index