Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: adjusted r-squared, regress with pweight
From 
 
Steve Samuels <[email protected]> 
To 
 
[email protected] 
Subject 
 
Re: st: adjusted r-squared, regress with pweight 
Date 
 
Thu, 13 May 2010 09:33:44 -0400 
reg [pw] is incorrect, until I can figure out how Stata calculates
it..  I think that my hand calculation may be incorrect because the
population definition of "mean square error' is not as clear to me as
it was some months ago when I did it.  This just reinforces Stas's
conclusion that these concepts are not too meaningful in a complex
survey setting.
Steve
On Thu, May 13, 2010 at 8:59 AM, Steve Samuels <[email protected]> wrote:
> I think that the adjusted r-square reported after -reg- with [pweight]
> is in error and that the displayed R-square is, in fact, adjusted
> R-square.   I ran  three weighted regressions (code below)
>
> I also directly calculated the adjusted r-square from svy: reg from
> the weighted estimates of mean square error Ve and population variance
>  V: adjusted R-square = 1- Ve/V.  ( agree with Stas that this has
> little practical value when data are heteroskedastic and clustered--it
> refers to
>
> The results were:
>                  Displayed R-square   Adjusted r-square:
> reg [pw]     0.6300                   0.6188 (e(r2_a)
> reg [fw]      0.6300                   0.6268 (displayed)
> svy: reg     0.6300                   0.6300 (direct)
>
> ************CODE*****************
> sysuse auto,clear
> reg mpg  length trunk [pw=rep78]
> di e(r2_a)   //adjusted r-square
> reg mpg  length trunk [fw=rep78]
>
> svyset _n [pweight=rep78]
> svy: reg mpg length trunk
> **********************************
>
> Steve
>
> --Stas Kolenikov to statalist
> Yes, David, it was asked before a number of times :)). Sum of squares
> and all that ANOVA stuff assumes the normal regression model (i.e.,
> the regression errors follow N(0,sigma^2) distribution). pweights
> imply a probability sampling design, under which no distributional
> assumptions are made, so the ANOVA table is inappropriate. You can
> still compute all the sums of squares, of course, but they may not
> have readily available population analogues; and the distributional
> results for F-tests do not have the exact finite sample interpretation
> anymore (although you'd still be able to get asymptotic Wald tests, I
> imagine).
>
> Likewise, you should not expect these things to show up when you
> specify -robust- or -cluster- standard errors -- you know your data
> are heteroskedastic, so why on earth would you ask for some sort of
> averaged variance?
> Steven Samuels
> [email protected]
> 18 Cantine's Island
> Saugerties NY 12477
> USA
> Voice: 845-246-0774
> Fax:    206-202-4783
>
-- 
Steven Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/