# Re: st: R2 in areg and R2 in xtreg

 From Ulrich Kohler To statalist@hsphsun2.harvard.edu Subject Re: st: R2 in areg and R2 in xtreg Date Tue, 29 May 2007 08:53:55 +0200

```Herve STOLOWY wrote:
> There was a recent exchange of mails
> (http://www.stata.com/statalist/archiv= e/2007-02/msg00617.html
> ) concerning the difference between the adjusted R-square obtained with
> -ar= eg- and the overal r-square obtained with -xtreg-.
>
> I run the following regressions:
>
> xi: xtreg abs_SDA1 i.n_dvlpt4*suivi2 roa abs_roa chg_roa size lev growth ,
> = fe, if outlier1!=3D1
>
> xi: areg abs_SDA1 i.n_dvlpt4*suivi2 roa abs_roa chg_roa size lev growth ,
> a= (gvkey) , if outlier1!=3D1
>
> I get exactly the same figures (coefficients, F...). The only difference
> is= related to R2: R2 overall =3D 2.53% with xtreg and Adjusted R-square
> =3D 2= 5.03% with areg.
>
> What is the difference between the two R2? To be honest, I do not clearly
> u= nderstand the explanation given by Mark in the answer to the previous
> threa= d: "I suppose what is happening is that the areg R2 and adj R2 refer
> to the= total variation in the dependent variable, whereas the FE R2 and
> adj R2 re= fer to the variation in the demeaned dependent variable".
>
> What does this mean? (I do not have the Stata manual on Panel data
> regressi= ons).

It probably help to recalculate the various r^2 by hand from a standard linear
Regression

Let's first calculate the regression-models with -areg- and -xtreg-:

. use http://www.stata-press.com/data/kk/beatles, clear
. areg lsat age, absorb(persnr)
. xtreg lsat age, i(persnr) fe

Now let's recalculate these models with a simple linear regression. This
works, because the dataset is so small:

. tab persnr, gen(d)
. reg lsat age d2 d3 d4

This model already shows the r^2 of -areg-, but here is another version to get
it:

. predict yh
. corr lsat yh
. di r(rho)^2

Hence, it is the squared correlation coefficient between the predicted values
and the dependent variables.

The "between r^2" of -xtreg- can be calculated from the standard regression as
follows:

. egen mlsat=mean(lsat), by(persnr)
. egen mage=mean(age), by(persnr)
. gen yh_between = _b[_cons] + _b[age]*mage
. corr mlsat yh_between
. di r(rho)^2

That is, we predict the observed values by only using the differences in the
independent variables between persons and compare them to the observed
differences between persons.

The "within r^2" is calculated by using only the differences in the
independent variable within persons to predict the observed within variance:

. gen lsat_demeaned = lsat - mlsat
. gen yh_within = _b[_cons] + _b[age]*(age-mage)
. corr lsat_demeaned yh_within
. di r(rho)^2

Finally the "overall r^2" use the whole information on the independent
variables to predict the observed value. Note however, that the fixed effect
are again not used for the prediction:

. gen yh_overall = _b[_cons] + _b[age]*age
. corr lsat yh_overall
. di r(rho)^2

To sum up: the -areg- r^2 is most similar to the overall r^2. The difference
is, that -areg- uses the fixed effects for the prediction, while -xtreg- does
not.

Many regares

Uli

--
Ulrich Kohler
kohler@wzb.eu
030/25491-361
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```