Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: FW: Model SS/R-square in nl

 From Steven Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: FW: Model SS/R-square in nl Date Thu, 30 Jun 2011 22:03:54 -0400

```One consequence of the fact that the mean-only model is not nested in the no-constant model is that it is possible that SSE > SST, so that s R-square = 1 - SSE/SST <0

In this example y is constant so the SST = 0, whereas SSE>0. Thus I believe that Gordon is incorrect and that the traditional  approach is correct.

**********************
clear
scalar drop _all
range x 0 10 11
gen y = 10
sum x
sum y
scalar n = r(N)
scalar var = r(Var)
scalar sstot = (n-1)*var

scalar list sstot
reg y x, nocons
******************

Steve

Some people agree with you, e.g.

H. A. Gordon.  Errors in Computer Packages. Least Squares Regression Through the Origin
Journal of the Royal Statistical Society. Series D (The Statistician) Vol. 30, No. 1 (Mar., 1981), pp. 23-29

But others don't, and the "error" is well-established. If you take your point of view, you have to justify an ANOVA table with the following d.f., taking p = 1 regressor.

SS	d.f.
Model   1
Error   n - 1
Total   n - 1 ?

This problem arises because the mean-only model is _not_ nested in the no-constant model as standard LS theory requires.

You can achieve a "nesting" by fitting no mean, getting:
Model 1
Error n - 1
Total n

The main benefit to the "SST must be the same for all models" approach, I think, is that one can compare R2 consistently for the same data set as R2 =  1 - SSE/SST.

Steve
sjsamuels@gmail.com

On Jun 30, 2011, at 4:24 PM, CJ Lan wrote:

If you look at the residual SS, i.e., sum of (yi-yhat)^2, the 1st model renders 28315 and the 2nd model renders 28427, which sounds reasonable because one parameter is eliminated.  My point is the Total SS, i.e., sum of (yi-mean(y))^2, should not be changed (=39434).  Therefore, in the 2nd model, the Model SS = (Total SS)-(residual SS) = 39434-28427 = 11007 and the R2 should have been 0.2791, which is the answer I got from Matlab.

The curve will not be forced through the origin.  The curve of the 1st model starts at (b0+b1=45.6) and decreases at an exponential rate.  Similarly the curve of the 2nd model starts at (b1=44.5) and decreases at a similar exponential rate.

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: Thursday, June 30, 2011 4:05 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: FW: Model SS/R-square in nl

No, it is not a bug.

Your constant may not be significant by itself, but the model is
different. R-squares for different models are often difficult to
compare effectively.

Plot the fitted curves and the data to see what it is going on.

In my experience, especially with nonlinear models, it is far better
to rely on physical, biological, economic or other scientific
understanding to choose the better model and to compare fitted curves
with the data, rather than to rely blindly on a significance test.
Does it make sense to force the curve through the origin?

Nick

On Thu, Jun 30, 2011 at 6:06 PM, CJ Lan <CJ@jupiter.fl.us> wrote:
> I was using nl to run a 3-parameter NLS model estimation and got R2=0.28
> (see the first output).  Since the parameter b0 is insignificant, I drop
> it and re-estimate it again.  This time, I got the wrong R2 (=0.86 in
> the 2nd output).  It is apparent that either the "Model SS" or "Total
> SS" is wrongly calculated.  Is this bug?  Thank you for help.
>
> (1)
> . nl exp3 : passby A in 1/152
> (obs =152)
> Iteration 0:  residual SS =3D  29741.65
> Iteration 1:  residual SS =3D  28448.53
> Iteration 2:  residual SS =3D  28316.37
> Iteration 3:  residual SS =3D  28315.61
> Iteration 4:  residual SS =3D   28315.6
> Iteration 5:  residual SS =3D   28315.6
> Iteration 6:  residual SS =3D   28315.6
> Iteration 7:  residual SS =3D   28315.6
>     Source |       SS       df       MS     Number of obs =152
> -------------+------------------------------  F(  2,   149) =29.25
>      Model |  11118.3472     2   5559.1736  Prob > F      =0.0000
>   Residual |  28315.6009   149   190.03759  R-squared     =0.2819
>      Total |  39433.9482   151  261.151975  Root MSE      =13.78541
>                                             Res. dev.     =1225.905
> 3-parameter asymptotic regression, passby = b0 + b1*b2^A
> ------------------------------------------------------------------------
>     passby |      Coef.   Std. Err.      t    P>|t| 95% Conf.Interval]
> -------------+----------------------------------------------------------
>         b0 |   11.59292   10.68695     1.08   0.280    -9.52 32.71048
>         b1 |   34.10476   9.433555     3.62   0.000     15.4 52.74559
>         b2 |    .998132   .0011685   854.19   0.000     .995 1.000441
> ------------------------------------------------------------------------
> * Parameter b0 taken as constant term in model & ANOVA table
> (SEs, P values, CIs, and correlations are asymptotic approximations)
>
> (2)
> . nl exp2 : passby A in 1/152
> (obs =3D 152)
> Iteration 0:  residual SS =3D  29510.02
> Iteration 1:  residual SS =3D  28427.14
> Iteration 2:  residual SS =3D  28426.97
> Iteration 3:  residual SS =3D  28426.97
>     Source |       SS       df       MS     Number of obs =152
> -------------+------------------------------  F(  2,   150) =468.32
>      Model |  177506.602     2  88753.3012  Prob > F      =0.0000
>   Residual |  28426.9672   150  189.513115  R-squared     =0.8620
>      Total |   205933.57   152  1354.82612  Root MSE      =13.76638
>                                             Res. dev.     =1226.502
> 2-parameter exp. growth curve, passby =3D b1*b2^A
> ------------------------------------------------------------------------
>     passby |      Coef.   Std. Err.      t    P>|t|[95% Conf.interval]
> -------------+----------------------------------------------------------
>         b1 |   44.54536   2.038308    21.85   0.000  40.51785 48.57286
>         b2 |   .9988862   .0001727  5783.22   0.000  .9985449 .9992275
> ------------------------------------------------------------------------
> (SEs, P values, CIs, and correlations are asymptotic approximations)
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```