Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Michael N. Mitchell" <Michael.Norman.Mitchell@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: linear, quadratic terms and centering |
Date | Sat, 21 Aug 2010 12:43:29 -0700 |
Dear FabioI created an example using the -auto- dataset myself and see the same kind of behavior you are describing in terms of VIF values, even after centering both -x1- and -x2-. In my example, I use -price- as -x1- and -weight- as -x2-. I center them both, and then run the model predicting mpg from c.centprice##c.centprice##c.centweight.
--- snip --- clear sysuse auto summarize price generate centprice = price - `r(mean)' summarize weight generate centweight = weight - `r(mean)' summ cent* regress mpg c.centprice##c.centprice##c.centweight vif --- snip ---As you can see in my example, the last VIF is about 12. The coefficients and standard errors are extremely tiny for the last term, but I think this is an issue of scaling (see my next example).
. regress mpg c.centprice##c.centprice##c.centweight Source | SS df MS Number of obs = 74 -------------+------------------------------ F( 5, 68) = 31.48 Model | 1706.23426 5 341.246852 Prob > F = 0.0000 Residual | 737.2252 68 10.8415471 R-squared = 0.6983 -------------+------------------------------ Adj R-squared = 0.6761 Total | 2443.45946 73 33.4720474 Root MSE = 3.2927 ------------------------------------------------------------------------------ mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- centprice | -.0004901 .0002641 -1.86 0.068 -.001017 .0000368 | c.centprice#| c.centprice | -1.55e-08 6.76e-08 -0.23 0.819 -1.50e-07 1.19e-07 | centweight | -.0058595 .0007058 -8.30 0.000 -.0072678 -.0044511 | c.centprice#| c.centweight | 5.53e-07 3.16e-07 1.75 0.085 -7.81e-08 1.18e-06 | c.centprice#| c.centprice#| c.centweight | 2.13e-11 6.39e-11 0.33 0.740 -1.06e-10 1.49e-10 | _cons | 20.63903 .5813351 35.50 0.000 19.479 21.79907 ------------------------------------------------------------------------------ . vif Variable | VIF 1/VIF -------------+---------------------- centprice | 4.08 0.244850 c.centprice#| c.centprice | 8.78 0.113918 centweight | 2.03 0.493619 c.centprice#| c.centweight | 5.13 0.194814 c.centprice#| c.centprice#| c.centweight | 12.01 0.083255 -------------+---------------------- Mean VIF | 6.41Now, this time I created z scores for the variables, to try and change the scaling of the variables so the coefficients and standard errors would not be so close to 0.
--- snip --- summarize price generate zprice = (price - `r(mean)')/`r(sd)' summarize weight generate zweight = (weight - `r(mean)')/`r(sd)' summ z* regress mpg c.zprice##c.zprice##c.zweight vif --- snip ----In the output below the p values are the same, and the VIF values are the same, but the coefficients and standard errors are not super close to 0. Even though the last term has a VIF of 12, its standard error does not seem outrageously inflated.
. regress mpg c.zprice##c.zprice##c.zweight Source | SS df MS Number of obs = 74 -------------+------------------------------ F( 5, 68) = 31.48 Model | 1706.23426 5 341.246852 Prob > F = 0.0000 Residual | 737.225201 68 10.8415471 R-squared = 0.6983 -------------+------------------------------ Adj R-squared = 0.6761 Total | 2443.45946 73 33.4720474 Root MSE = 3.2927 ------------------------------------------------------------------------------ mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- zprice | -1.445663 .7788145 -1.86 0.068 -2.999763 .108437 | c.zprice#| c.zprice | -.1348958 .5882425 -0.23 0.819 -1.308715 1.038924 | zweight | -4.553946 .5485145 -8.30 0.000 -5.648489 -3.459402 | c.zprice#| c.zweight | 1.266699 .7245552 1.75 0.085 -.1791279 2.712527 | c.zprice#| c.zprice#| c.zweight | .1437903 .4317442 0.33 0.740 -.7177418 1.005322 | _cons | 20.63903 .5813351 35.50 0.000 19.479 21.79907 ------------------------------------------------------------------------------ . vif Variable | VIF 1/VIF -------------+---------------------- zprice | 4.08 0.244850 c.zprice#| c.zprice | 8.78 0.113918 zweight | 2.03 0.493619 c.zprice#| c.zweight | 5.13 0.194814 c.zprice#| c.zprice#| c.zweight | 12.01 0.083255 -------------+---------------------- Mean VIF | 6.41My thinking is that these kinds of VIF values are, perhaps, the nature of this kind of model. Perhaps others have differing thoughts to share?
Lastly, I would recommend seeing... http://www.ats.ucla.edu/stat/sas/faq/spplot/reg_int_cont.htmwhich shows some 3-d visualizations of what these models look like. There are free 3d graphing programs you can search for and use over the internet to create 3d graphs of your results to help visualize them.
I fret I will be off email the rest of the weekend, but I hope this helps! Best luck, Michael N. Mitchell Data Management Using Stata - http://www.stata.com/bookstore/dmus.html A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html Stata tidbit of the week - http://www.MichaelNormanMitchell.com On 2010-08-21 2.20 AM, Fabio Zona wrote:
Thank you very much Michael! One more information: I have to make a complex interaction, whereby a linear and a square term are both interacted with a linear (continuous) variable x2: y = a + x1 + x1_Square + x2 + x2 x1 + x2 x1_Square One of my VIF ( the one related to x2 x1_Square ) is just above 10 ( it is 10,949), while the maximum condition index reaches the value of 19,782 (for the same interaction term x2 x1_Square ). Do I have a problem of multicollinearity with this value just above the threshold of 10 ? How can I fix this problem ? I know there is this command in STATA orthog; how does this command work? I though that I had to calculate the quadratic term and then use orthog between the linear and the quadratic term. However, since you said that (for centering variables) I first center the linear term and then calculate the quadratic term, I get confused also about orthog! Thanks ----- Messaggio originale ----- Da: "Michael N. Mitchell"<Michael.Norman.Mitchell@gmail.com> A: statalist@hsphsun2.harvard.edu Inviato: Sabato, 21 agosto 2010 10:43:12 GMT +01:00 Amsterdam/Berlino/Berna/Roma/Stoccolma/Vienna Oggetto: Re: st: linear, quadratic terms and centering Dear Fabio You were correct when you wrote... Or do I first center the linear term and calculate the square term on the basis of the "centered" linear term ? Or, in Stata, you can do this (say you want to center -x- around 100)... generate xcentered = x - 100 regress y c.xccentered##c.xccentered That -regress- command will include -xcentered- as well as the squared term. Hope this helps, Michael N. Mitchell Data Management Using Stata - http://www.stata.com/bookstore/dmus.html A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html Stata tidbit of the week - http://www.MichaelNormanMitchell.com On 2010-08-21 12.40 AM, Fabio Zona wrote:Dear Statalist, a very simple question: I have y = c + x0 + x1 + x1square In order to center x1 and x1square, do I first need to calculate the square term and afterwards to center both the linear and the quadratic term around their respective means? Or do I first center the linear term and calculate the square term on the basis of the "centered" linear term ? I guess that the first alternative works. Thanks * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/
* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/