Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: xtmepoisson out of sample prediction error


From   [email protected] (Roberto G. Gutierrez, StataCorp)
To   [email protected]
Subject   Re: st: xtmepoisson out of sample prediction error
Date   Wed, 09 Feb 2011 17:45:24 -0600

Jibonayan Raychaudhuri <[email protected]> asks:

> I have estimated a random intercept Poisson regression model using
> xtmepoisson in Stata :

> xtmepoisson  y x1 x2 x3  exposure(expvar) || district:, irr

> I now want to carry out an out-of-sample prediction for y using the
> estimated parameters of the above model.

> This is what I have done:

> Step 1: Estimate the model

> Step 2: predict b*,reffects

> Step 3: preserve

> Step 4: use newdata.dta,clear (new.dta has data on x1,x2 and x3 and the
> exposure variable expvar only--this is the out of sample data)

> Step 5: predict n (this is to predict mean count of y for newdata.dta)

> However Stata gives me an error message which says "variable y not found
> r(111)"

> The reason why I have used xtmepoisson instead of xtpoisson,normal is
> because I want predicted mean count to be based on both fixed and random
> effects. This is easy for an in-smaple prediction. However as I
> mentioned this is not working for an out-of-sample prediction. I know
> that this works if I set random effect=0 but this is not what I want.
> Can someone tell me why I am getting this error message? Any help would
> be greatly appreciated.
 
If you wish to make predictions after -xtmepoisson- that incorporate
random effects then you need to keep the estimation data in memory.  The
estimated random effects are calculated from the estimation data, and
this takes place both when you fit the model and when you make
predictions.

As such, rather than replace your estimation data with new prediction
data, you want to append the estimation data with the prediction data.

Your prediction data must contain values for all the covariates in your
model AND a value for the group variable (-district- in your example)
that is represented in the estimation data.  Because your prediction
will include a random effect, Stata needs to know which group's random
effect to use and cannot infer one for a group not represented at
estimation.

Here is an example using the Bangladesh Fertility Survey, where I fit an
-xtmepoisson- model then append one new observation to predict on:

   . webuse bangladesh, clear
   . gen children = child1 + 2*child2 + 3*child3    // No. of children
   . xtmepoisson children c_use age urban || district:

There are 1,934 observations in these data and so I add one more 

   . set obs 1935			

I then set the covariates and district for this new observation

   . replace c_use = 0 in 1935
   . replace age = 0 in 1935			// age is mean-centered
   . replace urban = 1 in 1935

Don't forget the group variable if you want to incorporate random effects

   . replace district = 40 in 1935

Now you can predict for both the estimation data and for the new observation

   . predict n

Profit

   . list c_use age urban district n in 1935

--Bobby
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index