Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

Re: st: adjusted r-squared, regress with pweight

 From Steve Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: adjusted r-squared, regress with pweight Date Thu, 13 May 2010 10:55:02 -0400

```I couldn't leave it for long, and I've managed to hand calculate the
adjusted r-square e(r2_a)   0.6188088229  reported by -reg [pw]- .
My conclusion stands:  adjusted r-square  from a pweighted regression
is an estimate of the one that would be obtained from OLS on a SRS of
the same size.

Steve

On Thu, May 13, 2010 at 10:39 AM, Steve Samuels <sjsamuels@gmail.com> wrote:
> Well, e(r2_a) was 0.6188, not 0.6218, my revised hand calculation, so
> I still have not figured it out!  I'll  have to leave this for another
> time,
>
> Steve
>
> On Thu, May 13, 2010 at 10:14 AM, Steve Samuels <sjsamuels@gmail.com> wrote:
>> Okay, I think that I've figured it out, and I apologize for the
>> confusion.   The adjusted R-square computed by  -reg [pw] -  corrects
>> the weighted estimates of the MSE and population variance by the same
>> corrections that would be appropriate for OLS regression on a sample
>> of the same size.  For the auto example with two covariates and one
>> intercept, ,  n = 69, and the corrections to MSE and variance are
>> (69/66) and (69/68), respectively.  With these correction, adjusted
>> R-square = 0.6218, the value given in e(r2_a).
>>
>> R-squared  are estimates of those that would have been reported if one
>> had  done OLS on a SRS of n = 69.  Adjusted R-squared is not, contrary
>> to  my original  belief,  a "population" estimate of anything.
>>
>> Steve
>>
>>
>> On Thu, May 13, 2010 at 9:33 AM, Steve Samuels <sjsamuels@gmail.com> wrote:
>>>  I'm going to withdraw my conclusion that the adjusted R-square from
>>> reg [pw] is incorrect, until I can figure out how Stata calculates
>>> it..  I think that my hand calculation may be incorrect because the
>>> population definition of "mean square error' is not as clear to me as
>>> it was some months ago when I did it.  This just reinforces Stas's
>>> conclusion that these concepts are not too meaningful in a complex
>>> survey setting.
>>>
>>> Steve
>>>
>>>
>>> On Thu, May 13, 2010 at 8:59 AM, Steve Samuels <sjsamuels@gmail.com> wrote:
>>>> I think that the adjusted r-square reported after -reg- with [pweight]
>>>> is in error and that the displayed R-square is, in fact, adjusted
>>>> R-square.   I ran  three weighted regressions (code below)
>>>>
>>>> I also directly calculated the adjusted r-square from svy: reg from
>>>> the weighted estimates of mean square error Ve and population variance
>>>>  V: adjusted R-square = 1- Ve/V.  ( agree with Stas that this has
>>>> little practical value when data are heteroskedastic and clustered--it
>>>> refers to
>>>>
>>>> The results were:
>>>> reg [pw]     0.6300                   0.6188 (e(r2_a)
>>>> reg [fw]      0.6300                   0.6268 (displayed)
>>>> svy: reg     0.6300                   0.6300 (direct)
>>>>
>>>> ************CODE*****************
>>>> sysuse auto,clear
>>>> reg mpg  length trunk [pw=rep78]
>>>> reg mpg  length trunk [fw=rep78]
>>>>
>>>> svyset _n [pweight=rep78]
>>>> svy: reg mpg length trunk
>>>> **********************************
>>>>
>>>> Steve
>>>>
>>>> --Stas Kolenikov to statalist
>>>> Yes, David, it was asked before a number of times :)). Sum of squares
>>>> and all that ANOVA stuff assumes the normal regression model (i.e.,
>>>> the regression errors follow N(0,sigma^2) distribution). pweights
>>>> imply a probability sampling design, under which no distributional
>>>> assumptions are made, so the ANOVA table is inappropriate. You can
>>>> still compute all the sums of squares, of course, but they may not
>>>> have readily available population analogues; and the distributional
>>>> results for F-tests do not have the exact finite sample interpretation
>>>> anymore (although you'd still be able to get asymptotic Wald tests, I
>>>> imagine).
>>>>
>>>> Likewise, you should not expect these things to show up when you
>>>> specify -robust- or -cluster- standard errors -- you know your data
>>>> are heteroskedastic, so why on earth would you ask for some sort of
>>>> averaged variance?
>>>> Steven Samuels
>>>> sjsamuels@gmail.com
>>>> 18 Cantine's Island
>>>> Saugerties NY 12477
>>>> USA
>>>> Voice: 845-246-0774
>>>> Fax:    206-202-4783
>>>>
>>>
>>>
>>>
>>> --
>>> Steven Samuels
>>> sjsamuels@gmail.com
>>> 18 Cantine's Island
>>> Saugerties NY 12477
>>> USA
>>> Voice: 845-246-0774
>>> Fax:    206-202-4783
>>>
>>
>>
>>
>> --
>> Steven Samuels
>> sjsamuels@gmail.com
>> 18 Cantine's Island
>> Saugerties NY 12477
>> USA
>> Voice: 845-246-0774
>> Fax:    206-202-4783
>>
>
>
>
> --
> Steven Samuels
> sjsamuels@gmail.com
> 18 Cantine's Island
> Saugerties NY 12477
> USA
> Voice: 845-246-0774
> Fax:    206-202-4783
>

--
Steven Samuels
sjsamuels@gmail.com
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```