An improved R-squared [STB-14: sg18] --------------------- ^brsq^ yvar [xvars] [^if^ exp] [^in^ range] [weight] [ ^, b^oxcox^(^Lvar | L^) ^t^vars^(^tvars^) r^eweight ^l^eave regression_options ] calculates deviances, R-squared and adjusted R-squared for two regression models. The models are yvar on xvars and f(yvar) on xvars, where f(.) is the normalized Box-Cox (power) transformation f(yvar) = (yvar^^L - 1)/(L * Ydot^^(L - 1)) and Ydot is the geometric mean of yvar. If L = 0 then f(Y) = Ydot * log(Y), essentially just a log transformation. The deviance is twice the negative log likelihood. `Deviance-0' is the deviance for a model with just a constant. The novel feature is that the values of R-squared are "scale-corrected" to the scale of yvar. This allows for the effect of the transformation on the coefficient of determination for the transformed model and makes valid the comparison with R-squared for the untransformed model. Options ------- ^boxcox()^ is the Box-Cox power parameter. You may use a variable (Lvar) or a constant (L). Default: constant 0 (log transformation). ^reweight^ computes the weights for the Box-Cox regression to allow for the power transformation, using the formula reweight = weight * (Ydot/Yfit)^^(2 * L - 2), where Yfit is the fit from the (weighted) regression of yvar on xvars. The default option (^noreweight^) forces both models to use the same weights (1, or as given by the ^weight^ variable). Thus use of ^reweight^ employs one set of weights which should be appropriate for (untransformed) yvar, that is, should be proportional to the reciprocal of its variance, whereas use of ^noreweight^ compares two regression models AND two systems of weights simultaneously. ^leave^ creates three new variables, _fy = f(yvar) as above, and _fyf and _sfy, the fitted values and SD of _fy estimated from the regression analysis. Options, continued ------------------ ^tvars()^ allows you to specify a different set, tvars, of regressors for f(yvar). There is in general no reason to expect the model for yvar to be satisfactory for f(yvar) also. regression_options are any of the standard options available with the ^regress^ command. Example ------- With the Stata example file ^auto.dta^, we wish to compare a model for ^mpg^ as a quadratic function of automobile ^weight^ with a similar model which uses log(^mpg^): . ^use auto^ . ^gen w = weight/1000^ /* scales ^weight^ to more reasonable values . ^gen w2 = w^^2^ . ^brsq w w2^ . ^brsq w w2, reweight^ The values of R-squared for the two analyses are as follows: Untransformed Transformed ^reweight^ ^noreweight^ R-squared-Scaled 0.6722 0.7098 0.7565 Adj-R-sq-Scaled 0.6630 0.7016 0.7496 The scaled R-squared is highest for the transformed model without reweighting. Assuming that the quadratic model is an adequate fit both on the original and on the log scales, this implies that (a) the log transformation is useful (it is probably countering skewness in the distribution of ^mpg^) and (b) the variance is more constant on the log scale than on the original scale. Note that the unadjusted and adjusted values of R-squared-scaled for the untransformed model (0.6722 and 0.6630) are identical to the R-squared values given by the ^regress^ command. However (and this is the whole point of "An improved R-squared") the corresponding values for the transformed model without reweighting (0.7565 and 0.7496) DIFFER from those obtained by regressing log(^mpg^) on ^w^ and ^w2^. In this example, the latter values are lower: 0.7158 and 0.7078. Stored are: ------ ^$S_1^ Scaled deviance (untransformed yvar) ^$S_2^ Scaled deviance (transformed yvar) ^$S_3^ R-squared (untransformed yvar) ^$S_4^ R-squared-scaled (transformed yvar) ^$S_5^ ajdusted R-squared (untransformed yvar) ^$S_6^ ajdusted R-squared-scaled (transformed yvar) ^$S_7^ Residual SD (untransformed yvar) ^$S_8^ Scaled residual SD (transformed yvar) ^$S_9^ F-ratio (untransformed yvar) ^$S_10^ Scaled F-ratio (transformed yvar) Authors ------- Patrick Royston, Royal Postgraduate Medical School Richard Goldstein, Qualitas Inc. Also see -------- STB: sg18 (STB-14) srd7 (STB-5)