Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: Interpretation of quadratic terms


From   Rodolphe Desbordes <rodolphe.desbordes@strath.ac.uk>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: Interpretation of quadratic terms
Date   Wed, 10 Mar 2010 12:34:51 +0000

Dear Rosie, Nick and Roger,

To conclude this thread and summarise the main arguments put forward by Nick, Roger and myself:

A) There can be some good reasons for "pre-emptive centering": a) to avoid computational issues, which are unlikely to arise with modern econometric softwares such as Stata; b) to provide substantive interpretation.

However Rosie wrote in her first message

" To avoid multicollinearity problem with the original variable and its quadratic term, I centered the variable first (X) and then created the square term (Xsq). The model with the quadratic term (Xsq) was proved to be significantly better."

B) Centering will not magically improve the precision/accuracy of the estimates. After centering, the estimates and their associated standard errors may differ from those obtained with the noncentered data, but that does not mean that the "centered model" performs better than the "noncentered model".

Rodolphe








________________________________________
From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] On Behalf Of Roger Newson [r.newson@imperial.ac.uk]
Sent: 10 March 2010 10:44
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: RE: Interpretation of quadratic terms

Of course, pre-emptive centering might be a good idea for other reasons.
The intercept parameter is easy to explain when it is the fuel
consumption (in gallons per mile) of a car of "average" weight (because
weight has been pre-emptively centered), but less easy to explain when
it is the fuel consumption of a fantasy car with zero weight (because
weight has not been pre-emptively centered).

Roger


On 09/03/2010 21:02, Nick Cox wrote:
> I think you're both right. In olden days, pre-emptive centring, as we
> say in English, was a good idea in order to avoid numerical problems
> with mediocre programs that did not handle near multicollinearity well.
> Nowadays, decent programs including Stata take care that you get bitten
> as little as possible by such problems. If course, if you really do have
> multicollinearity, nothing much can help, except that Stata drops
> predictors and flags the issue.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Rodolphe Desbordes
>
> My point is that centering does not reduce multicollinearity. As you can
> see in my example, the standard errors of the estimated marginal effects
> at the mean of `mpg' are the same using uncentered or centered values of
> `mpg'.
>
> Rosie Chen
>
> Thanks, Rodolphe, for this helpful demonstration. Agree that the major
> purpose of centering seems to be that we make the interpretation of X
> meaningful. I guess reducing multicollinearity is a bi-product of the
> benefit.
>
> Rodolphe Desbordes<rodolphe.desbordes@strath.ac.uk>
>
> Centering will not affect your estimates and their uncertainty. However,
> centering allows you to directly obtain the estimated effect of X on Y
> for a meaningful value of X, i.e. the mean of X.
>
> . sysuse auto.dta,clear
> (1978 Automobile Data)
>
> . gen double mpg2=mpg^2
>
> . reg price mpg mpg2
>
>        Source |       SS       df       MS              Number of obs =
> 74
> -------------+------------------------------           F(  2,    71) =
> 18.28
>         Model |   215835615     2   107917807           Prob>  F      =
> 0.0000
>      Residual |   419229781    71  5904644.81           R-squared     =
> 0.3399
> -------------+------------------------------           Adj R-squared =
> 0.3213
>         Total |   635065396    73  8699525.97           Root MSE      =
> 2429.9
>
> ------------------------------------------------------------------------
> ------
>         price |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
> Interval]
> -------------+----------------------------------------------------------
> ------
>           mpg |  -1265.194   289.5443    -4.37   0.000    -1842.529
> -687.8593
>          mpg2 |   21.36069   5.938885     3.60   0.001     9.518891
> 33.20249
>         _cons |   22716.48   3366.577     6.75   0.000     16003.71
> 29429.24
> ------------------------------------------------------------------------
> ------
>
> . sum mpg
>
>      Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>           mpg |        74     21.2973    5.785503         12         41
>
> . local m=r(mean)
>
> . lincom _b[mpg]+2*_b[mpg2]*`m'
>
> ( 1)  mpg + 42.59459 mpg2 = 0
>
> ------------------------------------------------------------------------
> ------
>         price |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
> Interval]
> -------------+----------------------------------------------------------
> ------
>           (1) |  -355.3442   58.86205    -6.04   0.000    -472.7118
> -237.9766
> ------------------------------------------------------------------------
> ------
>
> . gen double mpgm=mpg-`m'
>
> . gen double mpgm2=mpgm^2
>
> . reg price mpgm mpgm2
>
>        Source |       SS       df       MS              Number of obs =
> 74
> -------------+------------------------------           F(  2,    71) =
> 18.28
>         Model |   215835615     2   107917807           Prob>  F      =
> 0.0000
>      Residual |   419229781    71  5904644.81           R-squared     =
> 0.3399
> -------------+------------------------------           Adj R-squared =
> 0.3213
>         Total |   635065396    73  8699525.97           Root MSE      =
> 2429.9
>
> ------------------------------------------------------------------------
> ------
>         price |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
> Interval]
> -------------+----------------------------------------------------------
> ------
>          mpgm |  -355.3442   58.86205    -6.04   0.000    -472.7118
> -237.9766
>         mpgm2 |   21.36069   5.938885     3.60   0.001     9.518891
> 33.20249
>         _cons |   5459.933   343.8718    15.88   0.000     4774.272
> 6145.594
> ------------------------------------------------------------------------
> ------
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


--
Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: r.newson@imperial.ac.uk
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/

Opinions expressed are those of the author, not of the institution.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index