Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Unbalanced panel data + Adjusted R-square

From   Stas Kolenikov <>
Subject   Re: st: Unbalanced panel data + Adjusted R-square
Date   Mon, 9 Feb 2004 15:35:14 -0500 (EST)

--- In statalist, Marcel Normann wrote:
> I'm currently working on an unbalanced panel data set using the
> OLS/Prais-Winsten models with panel-corrected standard errors (xtpcse).
> How can I transform the R-square into an adjusted R-square?
> Is it correct to use the transformation for OLS models (i.e.  adj. R-
> square = 1-(1-R-square)*(n-1)/(n-k); where n=number of observations and
> k=number of parameters; see Gujarati (2002), p. 218)?

My two cents: I would tend to think that R2 only makes sense for iid data.
What most people tend to think about it is that it is "the proportion of
explained variance" -- is that the sense you want to put into it, too?
Well, so what is the "variance", in your case? If you have any sort of
heteroskedasticity, then the very concept of the disturbance variance is
not well defined: the variance of epsilon varies from one observation to
another, so there is no single number to quantify that. In the panel
setting, you would want to think of the between panel and within panel
terms (as -xtreg- does), even if you don't assume any heteroskedasticity.

Now, what most people tend to think about the variance? Well that
something that contributes to the standard errors of your estimates.
Here's my small example:

sysuse auto
bysort rep78 : g t = _n
tsset rep t
* just to have something with the panel structure
xtpcse pri mpg
xtpcse pri mpg, het
xtpcse pri mpg, het corr(ar1)
xtreg pri mpg
reg pri mpg, cluster(rep)

They all give the same point estimate of _b[mpg], but wildly different
standard errors ranging from 31.69 (xtpcse, default settings) to 82.63
(reg, robust), depending on the assumptions one would want to make about
the disturbance structure. And frankly I would have difficult time
justifying any one of those standard errors against another; I'd possibly
go along with the largest one to be on the safe side.

Now, what would that have to do with R2? Well that tries to deal with
variances and uncertainties associated with the fitted regression line,
but as we see from those standard errors, there is no single way to
quantify the uncertainty of the estimates, due to their special structure.
You may still have the R2 reported by the package, or you can compute it
yourself plugging the estimated slopes into the standard textbook formulas
for the simplest multiple regression with iid disturbances, but I am not
sure as to what kind of interpretation that would have.

Also, if you are using -xtpcse-, then you must buy the asymptotic nature
of your standard errors. So you should put n=infinity to your formulas,
and that gives you a very nice answer: R2 = R2adj.

So the bottom line is, the R2, adjusted or not, does not sound to me as a
very interesting thing to look at in the panel setting. I might be wrong
though -- maybe there's something special that you need your R2adj for?

 ---                                    Stas Kolenikov
 --       Ph.D. student in Statistics at UNC-Chapel Hill
 -  --

* This e-mail and all attachments to it are not intended to provide any
* reasonable point of view and was transmitted to you in error. It
* should be immediately deleted by all recipients unless they really
* enjoy communicating with the author :). Other restrictions apply.

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index