Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: R-squared for xtreg, re

From   "Nick Cox" <>
To   <>
Subject   st: Re: R-squared for xtreg, re
Date   Thu, 18 Sep 2003 00:24:29 +0100

Mehmet Beceren

> Does anybody know,
> if there is a way to make Stata give us the R-squared for
> the random
> effects panel regressions? Or, is it nonsense to report
> them for random
> effects estimations?

Questions of this form arise quite frequently.

I'll essay a general and discursive answer, and let others
chime in with technicalities, especially if the technicalities
invalidate what I'm going to say, either in general or with
particular models.

1. If Stata refuses to give you an R-squared, there may be a
good explanation, other than that the developers never got
around to implementing this. Perhaps the R-squared doesn't
seem to be a good measure for this model, on some technical
grounds. You have to consult the advanced literature or an
expert to take this further, unless you yourself are an
expert, in which case you probably disagree with the other experts.

2. There is usually something you can do for yourself:
calculate the correlation between the observed response and
the predicted response, and then square it. Here is the
general idea:

. regress weight length
. predict weightp if e(sample)
. corr weight weightp if e(sample)
. di r(rho)^2

Naturally, in this example, you get an R-squared any way,
so you need not do this. But you can check here that you
get the result you think you should get.

Two crucial details to note:

a. The predicted response must be on the same scale
as the response, up to a linear transformation.

b. Use -if e(sample)- to make sure everything
is done for the estimation sample only. (In
this example, the second -if e(sample)- is redundant
given the first, but it does no harm.)

3. For many models, especially those with categorical responses,
there are often several different supposed approximations or
analogues to R-squared. Often they are labelled "pseudo".
Beware that they typically do not agree, even roughly. You
need to look at literature in your field and to realise that
software and papers may often be unclear about precisely
what was calculated. Thus if you do this after -logit- you
will find that this is _not_ what -logit- reports as pseudo
R-squared. (What that actually is does not appear to be
documented, thus exemplifying my assertion.)

4. Even if you now have an R-squared, it is only a single
figure of merit. Resist the temptation to use it as a weapon or as a
comforter. Your R-squared may be high because your model
codifies tautology or truism; or your R-squared may be low,
but no indictment of your model, if the field is refractory and
your dataset is problematic. There is likely to be a great
deal of information about the limitations of the model, with
implications for how it can be improved, in the detailed
estimation results and residuals you can usually get from Stata.
There is almost no such information in an R-squared.

5. Even if you now have an R-squared, it is best a descriptive
measure. It takes into consideration only the information
on which it is based, and says nothing about the structure
of the data in any sense (e.g. dependence or group structure).


*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index