Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: RE: Beta coefficients are not equal to coefficients on standardized variables?

 From Kieran McCaul <[email protected]> To "[email protected]" <[email protected]> Subject st: RE: Beta coefficients are not equal to coefficients on standardized variables? Date Sat, 16 Jun 2012 14:03:24 +0800

```...

Don't standardize the dependent variable.

clear*
sysuse auto

regress weight length turn displacement, beta

egen length_std = std( length )
egen turn_std = std(turn)
egen displacement_std = std(displacement)

regress weight length_std turn_std displacement_std , beta

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Roberto Liebscher
Sent: Friday, 15 June 2012 11:23 PM
To: [email protected]
Subject: st: Beta coefficients are not equal to coefficients on standardized variables?

There is one thing that makes me puzzling about the - beta - option in
regression commands. In a simple example using the lifeexp dataset I
first used the built-in function - beta - :

sysuse auto

regress lexp gnppc popgrowth, beta

. regress lexp gnppc popgrowth, beta

Source |       SS       df       MS              Number of obs =
63
-------------+------------------------------           F(  2,    60) =
36.20
Model |  777.530873     2  388.765436           Prob > F      =
0.0000
Residual |  644.405635    60  10.7400939           R-squared     =
0.5468
0.5317
Total |  1421.93651    62  22.9344598           Root MSE      =
3.2772

------------------------------------------------------------------------------
lexp |      Coef.   Std. Err.      t    P>|t|    Beta
-------------+----------------------------------------------------------------
gnppc |    .000293   .0000419     6.99   0.000 .6506803
popgrowth |  -.9833919    .485387    -2.03   0.047 -.1885781
_cons |   70.67366   .8071596    87.56   0.000       .
------------------------------------------------------------------------------

Then I standardized the variables by hand and re-ran the regression with
the new variables:

. egen popgrowth_std = std(popgrowth)

. egen lexp_std = std(lexp)

. egen gnppc_std = std(gnppc)
(5 missing values generated)

regress lexp_std gnppc_std popgrowth_std

Source |       SS       df       MS              Number of obs =
63
-------------+------------------------------           F(  2,    60) =
36.20
Model |  34.9700449     2  17.4850225           Prob > F      =
0.0000
Residual |  28.9826364    60  .483043939           R-squared     =
0.5468
0.5317
Total |  63.9526813    62  1.03149486           Root MSE      =
.69501

------------------------------------------------------------------------------
lexp_std |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]
-------------+----------------------------------------------------------------
gnppc_std |   .6608475   .0945336     6.99   0.000     .4717521
.8499428
popgrowth_~d |  -.1942026   .0958554    -2.03   0.047    -.3859419
-.0024633
_cons |  -.0042032   .0875655    -0.05   0.962    -.1793602
.1709538
------------------------------------------------------------------------------

Now the coefficients are slightly different. For example the coefficient
on gnppc_std is 0.6608475 whereas it has been 0.6506803 in the first
calculation.

Is this caused by rounding errors? Or is there any other explanation for
this?

Roberto
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```