Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: multiple regression, r squared and normality of residuals


From   Richard Goldstein <[email protected]>
To   [email protected]
Subject   Re: st: multiple regression, r squared and normality of residuals
Date   Wed, 23 Mar 2011 08:43:52 -0400

I see several issues here which I touch on prior to providing an
"answer" to the R-squared aspect:

1. an outcome variable and a (non-linearly) transformed version of that
outcome (e.g., log) cannot be compared re: R-squared unless you use a
special version of R-squared (see below)

2. in general R-squareds on different N's are not comparable

3. why did N drop? if real zeros now became missing, you have added problems

4. if the residuals are normally distributed without the transform, why
transform? (certain answers to this question would turn one to -glm-
with log link)

5. if you really want to compare R-squared values for different versions
of the "same" outcome, there are ways to do it; as the (co-)author of at
least two of these, I recommend -brsq- (use -findit brsq-) to find the
program

Rich

On 3/22/11 9:11 PM, Arti Pandey wrote:
> 
> Hello
> 
> I ran multiple regression with in stata using two models;
> the first gave an R-squared of .35, p values of all predictors was less than 
> 0.001 except one which was less than 0.05. No.  of obs. used was 84, 
> distribution of residuals was normal.
> Then I did a log transform of the dependent variable, r squared went up to .65, 
> p values for all predictors was 0.001 except the one mentioned above, which is 
> now 0.06. The residuals were also slightly skewed to the left. No. of obs went 
> down to 77.
> My question is how do I decide between the R squared and distribution of 
> residuals. Is such a high rise in R squared worth sacrificing no of observations 
> 
> and normal distribution of residuals for. Since the skew is not very pronounced, 
> 
> it is tempting to go with the second, but then the regression  model might be 
> wrong.....
> Appreciate any help.
> Arti
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index