Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: R2 in areg and R2 in xtreg

From   Ulrich Kohler <>
Subject   Re: st: R2 in areg and R2 in xtreg
Date   Tue, 29 May 2007 08:53:55 +0200

Herve STOLOWY wrote:
> There was a recent exchange of mails
> ( e/2007-02/msg00617.html
> ) concerning the difference between the adjusted R-square obtained with
> -ar= eg- and the overal r-square obtained with -xtreg-.
> I run the following regressions:
> xi: xtreg abs_SDA1 i.n_dvlpt4*suivi2 roa abs_roa chg_roa size lev growth ,
> = fe, if outlier1!=3D1
> xi: areg abs_SDA1 i.n_dvlpt4*suivi2 roa abs_roa chg_roa size lev growth ,
> a= (gvkey) , if outlier1!=3D1
> I get exactly the same figures (coefficients, F...). The only difference
> is= related to R2: R2 overall =3D 2.53% with xtreg and Adjusted R-square
> =3D 2= 5.03% with areg.
> What is the difference between the two R2? To be honest, I do not clearly
> u= nderstand the explanation given by Mark in the answer to the previous
> threa= d: "I suppose what is happening is that the areg R2 and adj R2 refer
> to the= total variation in the dependent variable, whereas the FE R2 and
> adj R2 re= fer to the variation in the demeaned dependent variable".
> What does this mean? (I do not have the Stata manual on Panel data
> regressi= ons).

It probably help to recalculate the various r^2 by hand from a standard linear 

Let's first calculate the regression-models with -areg- and -xtreg-:

. use, clear
. areg lsat age, absorb(persnr)
. xtreg lsat age, i(persnr) fe

Now let's recalculate these models with a simple linear regression. This 
works, because the dataset is so small: 

. tab persnr, gen(d)
. reg lsat age d2 d3 d4

This model already shows the r^2 of -areg-, but here is another version to get 

. predict yh
. corr lsat yh
. di r(rho)^2

Hence, it is the squared correlation coefficient between the predicted values 
and the dependent variables.

The "between r^2" of -xtreg- can be calculated from the standard regression as 

. egen mlsat=mean(lsat), by(persnr)
. egen mage=mean(age), by(persnr)
. gen yh_between = _b[_cons] + _b[age]*mage
. corr mlsat yh_between
. di r(rho)^2

That is, we predict the observed values by only using the differences in the 
independent variables between persons and compare them to the observed 
differences between persons.

The "within r^2" is calculated by using only the differences in the 
independent variable within persons to predict the observed within variance:

. gen lsat_demeaned = lsat - mlsat
. gen yh_within = _b[_cons] + _b[age]*(age-mage)
. corr lsat_demeaned yh_within
. di r(rho)^2

Finally the "overall r^2" use the whole information on the independent 
variables to predict the observed value. Note however, that the fixed effect 
are again not used for the prediction:

. gen yh_overall = _b[_cons] + _b[age]*age
. corr lsat yh_overall
. di r(rho)^2

To sum up: the -areg- r^2 is most similar to the overall r^2. The difference 
is, that -areg- uses the fixed effects for the prediction, while -xtreg- does 

Many regares


Ulrich Kohler
*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index