Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: linear, quadratic terms and centering

From	"Michael N. Mitchell" <[email protected]>
To	[email protected]
Subject	Re: st: linear, quadratic terms and centering
Date	Mon, 23 Aug 2010 10:49:26 -0700

Dear Fabio

  I am glad that the last message helped... regarding your question...

"Even though the last term has a VIF of 12, its standard error does not seem outrangeouslyinflated"? How can you say that it is not inflated, just after calculating z-score? Ifthey were inflated, you would expect stderr close to zero after calculating z-score?

I have seen cases where the collinearity was so great, that the standard errors blew upinto the millions, billions, or even higher. Those wildly inflated standard errors led toconfidence intervals that were enormous. For the example that I posted, the standarderrors were not *wildly* inflated as I have seen before.


Best regards,

Michael N. Mitchell
Data Management Using Stata      - http://www.stata.com/bookstore/dmus.html
A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html
Stata tidbit of the week         - http://www.MichaelNormanMitchell.com



On 2010-08-22 10.43 AM, Fabio Zona wrote:

Michael,

your message is helpful; I thank you for this.

I would emphasize what you wrote below, so that others can see it better and maybe make a contribution on this topic:

Does anybody have differing thoughts to share on this subject?? Are these kind of VIF values the nature of this kind of model?


Also: how can you say that "Even though the last term has a VIF of 12, its standard error does not seem outrangeously inflated"? How can you say that it is not inflated, just after calculating z-score? If they were inflated, you would expect stderr close to zero after calculating z-score?

Thanks a lot!
Fabio





----- Messaggio originale -----
Da: "Michael N. Mitchell"<[email protected]>
A: [email protected]
Inviato: Sabato, 21 agosto 2010 21:43:29 GMT +01:00 Amsterdam/Berlino/Berna/Roma/Stoccolma/Vienna
Oggetto: Re: st: linear, quadratic terms and centering

Dear Fabio

    I created an example using the -auto- dataset myself and see the same kind of behavior
you are describing in terms of VIF values, even after centering both -x1- and -x2-. In my
example, I use -price- as -x1- and -weight- as -x2-. I center them both, and then run the
model predicting mpg from c.centprice##c.centprice##c.centweight.

--- snip ---
clear
sysuse auto

summarize price
generate centprice = price - `r(mean)'
summarize weight
generate centweight = weight - `r(mean)'
summ cent*

regress mpg c.centprice##c.centprice##c.centweight
vif
--- snip ---

    As you can see in my example, the last VIF is about 12. The coefficients and standard
errors are extremely tiny for the last term, but I think this is an issue of scaling (see
my next example).

. regress mpg c.centprice##c.centprice##c.centweight

        Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  5,    68) =   31.48
         Model |  1706.23426     5  341.246852           Prob>  F      =  0.0000
      Residual |    737.2252    68  10.8415471           R-squared     =  0.6983
-------------+------------------------------           Adj R-squared =  0.6761
         Total |  2443.45946    73  33.4720474           Root MSE      =  3.2927

------------------------------------------------------------------------------
           mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     centprice |  -.0004901   .0002641    -1.86   0.068     -.001017    .0000368
               |
   c.centprice#|
   c.centprice |  -1.55e-08   6.76e-08    -0.23   0.819    -1.50e-07    1.19e-07
               |
    centweight |  -.0058595   .0007058    -8.30   0.000    -.0072678   -.0044511
               |
   c.centprice#|
c.centweight |   5.53e-07   3.16e-07     1.75   0.085    -7.81e-08    1.18e-06
               |
   c.centprice#|
   c.centprice#|
c.centweight |   2.13e-11   6.39e-11     0.33   0.740    -1.06e-10    1.49e-10
               |
         _cons |   20.63903   .5813351    35.50   0.000       19.479    21.79907
------------------------------------------------------------------------------

. vif

      Variable |       VIF       1/VIF
-------------+----------------------
     centprice |      4.08    0.244850
   c.centprice#|
   c.centprice |      8.78    0.113918
    centweight |      2.03    0.493619
   c.centprice#|
c.centweight |      5.13    0.194814
   c.centprice#|
   c.centprice#|
c.centweight |     12.01    0.083255
-------------+----------------------
      Mean VIF |      6.41


    Now, this time I created z scores for the variables, to try and change the scaling of
the variables so the coefficients and standard errors would not be so close to 0.

--- snip ---
summarize price
generate zprice = (price - `r(mean)')/`r(sd)'
summarize weight
generate zweight = (weight - `r(mean)')/`r(sd)'
summ z*

regress mpg c.zprice##c.zprice##c.zweight
vif
--- snip ----

    In the output below the p values are the same, and the VIF values are the same, but the
coefficients and standard errors are not super close to 0. Even though the last term has a
VIF of 12, its standard error does not seem outrageously inflated.

. regress mpg c.zprice##c.zprice##c.zweight

        Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  5,    68) =   31.48
         Model |  1706.23426     5  341.246852           Prob>  F      =  0.0000
      Residual |  737.225201    68  10.8415471           R-squared     =  0.6983
-------------+------------------------------           Adj R-squared =  0.6761
         Total |  2443.45946    73  33.4720474           Root MSE      =  3.2927

------------------------------------------------------------------------------
           mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        zprice |  -1.445663   .7788145    -1.86   0.068    -2.999763     .108437
               |
      c.zprice#|
      c.zprice |  -.1348958   .5882425    -0.23   0.819    -1.308715    1.038924
               |
       zweight |  -4.553946   .5485145    -8.30   0.000    -5.648489   -3.459402
               |
      c.zprice#|
     c.zweight |   1.266699   .7245552     1.75   0.085    -.1791279    2.712527
               |
      c.zprice#|
      c.zprice#|
     c.zweight |   .1437903   .4317442     0.33   0.740    -.7177418    1.005322
               |
         _cons |   20.63903   .5813351    35.50   0.000       19.479    21.79907
------------------------------------------------------------------------------

. vif

      Variable |       VIF       1/VIF
-------------+----------------------
        zprice |      4.08    0.244850
      c.zprice#|
      c.zprice |      8.78    0.113918
       zweight |      2.03    0.493619
      c.zprice#|
     c.zweight |      5.13    0.194814
      c.zprice#|
      c.zprice#|
     c.zweight |     12.01    0.083255
-------------+----------------------
      Mean VIF |      6.41

    My thinking is that these kinds of VIF values are, perhaps, the nature of this kind of
model. Perhaps others have differing thoughts to share?

    Lastly, I would recommend seeing...

http://www.ats.ucla.edu/stat/sas/faq/spplot/reg_int_cont.htm

    which shows some 3-d visualizations of what these models look like. There are free 3d
graphing programs you can search for and use over the internet to create 3d graphs of your
results to help visualize them.

    I fret I will be off email the rest of the weekend, but I hope this helps!

Best luck,

Michael N. Mitchell
Data Management Using Stata      - http://www.stata.com/bookstore/dmus.html
A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html
Stata tidbit of the week         - http://www.MichaelNormanMitchell.com



On 2010-08-21 2.20 AM, Fabio Zona wrote:

Thank you very much Michael!

One more information: I have to make a complex interaction, whereby a linear and a square term are both interacted with a linear (continuous) variable x2:

y = a +   x1 + x1_Square  + x2  +   x2 x1  +   x2 x1_Square


One of my VIF ( the one related to  x2 x1_Square  ) is just above 10 ( it is 10,949),  while the maximum condition index reaches the value of  19,782 (for the same  interaction term  x2  x1_Square ).

Do I have a problem of multicollinearity with this value just above the threshold of 10 ? How can I fix this problem ?

I know there is this command in STATA orthog; how does this command work? I though that I had to calculate the quadratic term and then use orthog between the linear and the quadratic term. However, since you said that (for centering variables) I first center the linear term and then calculate the quadratic term, I get confused also about orthog!
Thanks






----- Messaggio originale -----
Da: "Michael N. Mitchell"<[email protected]>
A: [email protected]
Inviato: Sabato, 21 agosto 2010 10:43:12 GMT +01:00 Amsterdam/Berlino/Berna/Roma/Stoccolma/Vienna
Oggetto: Re: st: linear, quadratic terms and centering

Dear Fabio

     You were correct when you wrote...

Or do I first center the linear term and calculate the square term on the basis of the
"centered" linear term ?

     Or, in Stata, you can do this (say you want to center -x- around 100)...

generate xcentered = x - 100
regress y c.xccentered##c.xccentered

     That -regress- command will include -xcentered- as well as the squared term.

Hope this helps,

Michael N. Mitchell
Data Management Using Stata      - http://www.stata.com/bookstore/dmus.html
A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html
Stata tidbit of the week         - http://www.MichaelNormanMitchell.com



On 2010-08-21 12.40 AM, Fabio Zona wrote:

Dear Statalist,

a very simple question: I have

y = c + x0 + x1 + x1square

In order to center x1 and x1square, do I first need to calculate the square term and afterwards to center both the linear and the quadratic term around their respective means?
Or do I first center the linear term and calculate the square term on the basis of the "centered" linear term ?

I guess that the first alternative works.

Thanks

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: linear, quadratic terms and centering
  - From: Fabio Zona <[email protected]>

Prev by Date: Re: st: Op. sys. refuses to provide memory - a cautionary tale
Next by Date: st: mi commands
Previous by thread: Re: st: linear, quadratic terms and centering
Next by thread: st: difficulties entering start and end date in rolling
Index(es):
- Date
- Thread