Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Comparison of the R-squared in a loglog and linear model

 From Austin Nichols To statalist@hsphsun2.harvard.edu Subject Re: st: Comparison of the R-squared in a loglog and linear model Date Fri, 18 Jun 2010 11:17:25 -0400

```Kit et al.--
Duan's smearing method is one approach to dealing with a logged
depvar; a better approach is to use a regression technique that
respects the functional form, like -poisson- (or another member of the
-glm- family). But you still cannot compare the R-squared across
non-nested models and hope to conclude anything about which model is
better from that information alone.  Mean squared prediction error in
levels for the nonzero outcomes seems a reasonable criterion for
rejecting the log(y) regression model below.

use http://fmwww.bc.edu/ec-p/data/mus/mus03data, clear
qui reg totexp suppins phylim actlim totchr age female income
predict xb
qui reg ltotexp suppins phylim actlim totchr age female income
levpredict tenorm
levpredict teduan, duan print
qui poisson totexp suppins phylim actlim totchr age female income
predict tepois
qui nbreg totexp suppins phylim actlim totchr age female income
predict tenbreg
su totexp xb te*
su totexp xb te* if totexp>0
corr totexp xb te*
g mse_xb=(totexp-xb)^2/1e6
g mse_norm=(totexp-tenorm)^2/1e6
g mse_duan=(totexp-teduan)^2/1e6
g mse_pois=(totexp-tepois)^2/1e6
g mse_nbreg=(totexp-tenbreg)^2/1e6
su mse*
su mse* if totexp>0

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
mse_xb |      2955    127.0504    642.6503     .00005   12779.11
mse_norm |      2955    142.4353    641.0374   3.32e-06   11744.09
mse_duan |      2955    140.7604    644.1605   .0000549   11842.16
mse_pois |      2955    128.3255    648.1356   4.52e-06   12841.78
mse_nbreg |      2955    131.8694    642.3027   2.48e-06   12432.65

For those enamored of scatter plots for this kind of comparison, much
more work is required to get a good picture of fit.  This is one
approach:

g cr_te=totexp^(1/3)
g cr_xb=sign(xb)*abs(xb)^(1/3)
g cr_norm=tenorm^(1/3)
g cr_duan=teduan^(1/3)
g cr_pois=tepois^(1/3)
g cr_nbreg=tenbreg^(1/3)
sc cr_* cr_te if totexp>0, msize(1 1 1 1 1 1)

On Fri, Jun 18, 2010 at 9:47 AM, Christopher Baum <kit.baum@bc.edu> wrote:
> <>
> On Jun 18, 2010, at 2:33 AM, Natalie wrote:
>
>> Can I not maybe obtain the antilog predicted values for the log log
>> model and compute the R-squared between the antilog of the observed and
>> predicted values. And then compare this R-square with the R-square
>> obtained from OLS estimation of the linear model?
>>
>> There are other statistical programs that can do this automatically, but
>> as I work with Stata, I'd rather do it with this program.
>
>
> findit levpredict
>
> Generate the level form of the dependent variable (correctly, using this routine) and then
> compute the squared correlation between that and the original level variable. That will be the
> R^2 of the log form of the regression.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```