Herve STOLOWY wrote:
> There was a recent exchange of mails
> (http://www.stata.com/statalist/archiv= e/2007-02/msg00617.html
> ) concerning the difference between the adjusted R-square obtained with
> -ar= eg- and the overal r-square obtained with -xtreg-.
>
> I run the following regressions:
>
> xi: xtreg abs_SDA1 i.n_dvlpt4*suivi2 roa abs_roa chg_roa size lev growth ,
> = fe, if outlier1!=3D1
>
> xi: areg abs_SDA1 i.n_dvlpt4*suivi2 roa abs_roa chg_roa size lev growth ,
> a= (gvkey) , if outlier1!=3D1
>
> I get exactly the same figures (coefficients, F...). The only difference
> is= related to R2: R2 overall =3D 2.53% with xtreg and Adjusted R-square
> =3D 2= 5.03% with areg.
>
> What is the difference between the two R2? To be honest, I do not clearly
> u= nderstand the explanation given by Mark in the answer to the previous
> threa= d: "I suppose what is happening is that the areg R2 and adj R2 refer
> to the= total variation in the dependent variable, whereas the FE R2 and
> adj R2 re= fer to the variation in the demeaned dependent variable".
>
> What does this mean? (I do not have the Stata manual on Panel data
> regressi= ons).
It probably help to recalculate the various r^2 by hand from a standard linear
Regression
Let's first calculate the regression-models with -areg- and -xtreg-:
. use http://www.stata-press.com/data/kk/beatles, clear
. areg lsat age, absorb(persnr)
. xtreg lsat age, i(persnr) fe
Now let's recalculate these models with a simple linear regression. This
works, because the dataset is so small:
. tab persnr, gen(d)
. reg lsat age d2 d3 d4
This model already shows the r^2 of -areg-, but here is another version to get
it:
. predict yh
. corr lsat yh
. di r(rho)^2
Hence, it is the squared correlation coefficient between the predicted values
and the dependent variables.
The "between r^2" of -xtreg- can be calculated from the standard regression as
follows:
. egen mlsat=mean(lsat), by(persnr)
. egen mage=mean(age), by(persnr)
. gen yh_between = _b[_cons] + _b[age]*mage
. corr mlsat yh_between
. di r(rho)^2
That is, we predict the observed values by only using the differences in the
independent variables between persons and compare them to the observed
differences between persons.
The "within r^2" is calculated by using only the differences in the
independent variable within persons to predict the observed within variance:
. gen lsat_demeaned = lsat - mlsat
. gen yh_within = _b[_cons] + _b[age]*(age-mage)
. corr lsat_demeaned yh_within
. di r(rho)^2
Finally the "overall r^2" use the whole information on the independent
variables to predict the observed value. Note however, that the fixed effect
are again not used for the prediction:
. gen yh_overall = _b[_cons] + _b[age]*age
. corr lsat yh_overall
. di r(rho)^2
To sum up: the -areg- r^2 is most similar to the overall r^2. The difference
is, that -areg- uses the fixed effects for the prediction, while -xtreg- does
not.
Many regares
Uli
--
Ulrich Kohler
kohler@wzb.eu
030/25491-361
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/