Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Calculating R2 for pweighted data.


From   Steve Samuels <[email protected]>
To   Judy Lee <[email protected]>, [email protected]
Subject   st: Calculating R2 for pweighted data.
Date   Wed, 28 Aug 2013 10:42:24 -0400

I'm forwarding a private question from Judy Lee, who referred to some
old posts of mine.

Dear Ms. Lee:

I didn't hand-calculate R2, just AdjR2 starting from R2.

********************************
sysuse auto,clear
 reg mpg  length trunk [pw=rep78]
di %15.10f e(r2)
di %15.10f e(r2_a)   //adjusted r-square
di %15.10f 1 - (1- e(r2))*68/66 //by hand
*******************************

The latter is from the relation:
AdjR2 = 1 - (1 - R2)*(n-1)/(n - p -1), 
where p = no. covariates


Please ask any further questions on Statalist, and be sure to read the
FAQ, including the section on private emails. Note that correct spelling
is "Stata", not "STATA", for the reason given in the FAQ.

Steven Samuels



> On Aug 26, 2013, at 6:07 PM, Judy Lee wrote:
> 
> How do you hand calculate R^2?  I cant seem to match with my regression from STATA
>  
> Re: st: adjusted r-squared, regress with pweight
> 
> From
> 
> Steve Samuels <[email protected]>
> To
> [email protected]
> Subject
> Re: st: adjusted r-squared, regress with pweight
> Date
> Thu, 13 May 2010 10:55:02 -0400
> I couldn't leave it for long, and I've managed to hand calculate the
> adjusted r-square e(r2_a)   0.6188088229  reported by -reg [pw]- .
> My conclusion stands:  adjusted r-square  from a pweighted regression
> is an estimate of the one that would be obtained from OLS on a SRS of
> the same size.
>  
> Steve
>  
> On Thu, May 13, 2010 at 10:39 AM, Steve Samuels <[email protected]> wrote:
> Well, e(r2_a) was 0.6188, not 0.6218, my revised hand calculation, so
> I still have not figured it out!  I'll  have to leave this for another
> time,
> 
> Steve
> 
> On Thu, May 13, 2010 at 10:14 AM, Steve Samuels <[email protected]> wrote:
>> Okay, I think that I've figured it out, and I apologize for the
>> confusion.   The adjusted R-square computed by  -reg [pw] -  corrects
>> the weighted estimates of the MSE and population variance by the same
>> corrections that would be appropriate for OLS regression on a sample
>> of the same size.  For the auto example with two covariates and one
>> intercept, ,  n = 69, and the corrections to MSE and variance are
>> (69/66) and (69/68), respectively.  With these correction, adjusted
>> R-square = 0.6218, the value given in e(r2_a).
>> 
>> These can be interpreted as follows:   The  unadjusted and adjusted
>> R-squared  are estimates of those that would have been reported if one
>> had  done OLS on a SRS of n = 69.  Adjusted R-squared is not, contrary
>> to  my original  belief,  a "population" estimate of anything.
>> 
>> Steve
>> 
>> 
>> On Thu, May 13, 2010 at 9:33 AM, Steve Samuels <[email protected]> wrote:
>>>  I'm going to withdraw my conclusion that the adjusted R-square from
>>> reg [pw] is incorrect, until I can figure out how Stata calculates
>>> it..  I think that my hand calculation may be incorrect because the
>>> population definition of "mean square error' is not as clear to me as
>>> it was some months ago when I did it.  This just reinforces Stas's
>>> conclusion that these concepts are not too meaningful in a complex
>>> survey setting.
>>> 
>>> Steve
>>> 
>>> 
>>> On Thu, May 13, 2010 at 8:59 AM, Steve Samuels <[email protected]> wrote:
>>>> I think that the adjusted r-square reported after -reg- with [pweight]
>>>> is in error and that the displayed R-square is, in fact, adjusted
>>>> R-square.   I ran  three weighted regressions (code below)
>>>> 
>>>> I also directly calculated the adjusted r-square from svy: reg from
>>>> the weighted estimates of mean square error Ve and population variance
>>>>  V: adjusted R-square = 1- Ve/V.  ( agree with Stas that this has
>>>> little practical value when data are heteroskedastic and clustered--it
>>>> refers to
>>>> 
>>>> The results were:
>>>>                  Displayed R-square   Adjusted r-square:
>>>> reg [pw]     0.6300                   0.6188 (e(r2_a)
>>>> reg [fw]      0.6300                   0.6268 (displayed)
>>>> svy: reg     0.6300                   0.6300 (direct)
>>>> 
>>>> ************CODE*****************
>>>> sysuse auto,clear
>>>> reg mpg  length trunk [pw=rep78]
>>>> di e(r2_a)   //adjusted r-square
>>>> reg mpg  length trunk [fw=rep78]
>>>> 
>>>> svyset _n [pweight=rep78]
>>>> svy: reg mpg length trunk
>>>> **********************************
>>>> 
>>>> Steve
>>>> 
>>>> --Stas Kolenikov to statalist
>>>> Yes, David, it was asked before a number of times :)). Sum of squares
>>>> and all that ANOVA stuff assumes the normal regression model (i.e.,
>>>> the regression errors follow N(0,sigma^2) distribution). pweights
>>>> imply a probability sampling design, under which no distributional
>>>> assumptions are made, so the ANOVA table is inappropriate. You can
>>>> still compute all the sums of squares, of course, but they may not
>>>> have readily available population analogues; and the distributional
>>>> results for F-tests do not have the exact finite sample interpretation
>>>> anymore (although you'd still be able to get asymptotic Wald tests, I
>>>> imagine).
>>>> 
>>>> Likewise, you should not expect these things to show up when you
>>>> specify -robust- or -cluster- standard errors -- you know your data
>>>> are heteroskedastic, so why on earth would you ask for some sort of
>>>> averaged variance?
>>>> Steven Samuels
>>>> [email protected]
>>>> 18 Cantine's Island
>>>> Saugerties NY 12477
>>>> USA
>>>> Voice: 845-246-0774
>>>> Fax:    206-202-4783
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Steven Samuels
>>> [email protected]
>>> 18 Cantine's Island
>>> Saugerties NY 12477
>>> USA
>>> Voice: 845-246-0774
>>> Fax:    206-202-4783
>>> 
>> 
>> 
>> 
>> --
>> Steven Samuels
>> [email protected]
>> 18 Cantine's Island
>> Saugerties NY 12477
>> USA
>> Voice: 845-246-0774
>> Fax:    206-202-4783
>> 
> 
> 
> 
> --
> Steven Samuels
> [email protected]
> 18 Cantine's Island
> Saugerties NY 12477
> USA
> Voice: 845-246-0774
> Fax:    206-202-4783
> 
 
 
 
-- 
Steven Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783
 
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
	• References:
		• st: adjusted r-squared, regress with pweight
			• From: David Kantor <[email protected]>
		• Re: st: adjusted r-squared, regress with pweight
			• From: Stas Kolenikov <[email protected]>
		• Re: st: adjusted r-squared, regress with pweight
			• From: Steve Samuels <[email protected]>
		• Re: st: adjusted r-squared, regress with pweight
			• From: Steve Samuels <[email protected]>
		• Re: st: adjusted r-squared, regress with pweight
			• From: Steve Samuels <[email protected]>
		• Re: st: adjusted r-squared, regress with pweight
			• From: Steve Samuels <[email protected]>
	• Prev by Date: Re: st: adjusted r-squared, regress with pweight
	• Next by Date: st: RE: RE: RE: RE: Stata stuck in loop with -rndpoi-?
	• Previous by thread: Re: st: adjusted r-squared, regress with pweight
	• Next by thread: Re: st: adjusted r-squared, regress with pweight
	• Index(es):
		• Date
		• Thread
© Copyright 1996–2013 StataCorp LP | Terms of use | Privacy | Contact us | Site
 
 


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index