[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
wgould@stata.com (William Gould, Stata) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Overriding dropping of collinear variables |

Date |
Wed, 09 Jul 2003 09:03:44 -0500 |

James Valcour <jvalcour@rogers.com> asked, > Is there a way to tell Stata (either v7 or v8) not to drop collinear > variables? Usually I wouldn't try and do this, but I'm trying to compare > some output produced from SAS's proc genmod with some glm output from Stata. > SAS doesn't automatically drop collinear variables. I'm trying to do this > because all the examples from a course I recently took were in SAS and I'm > trying to see if I can get the same results from Stata. Actually, I believe that SAS too will drop collinear variables and I suspect this is a case where Stata is declaring a variable collinear and SAS is not. Scott Merryman <smerryman@kc.rr.com>, in responding to the question by James, noted that SAS will produce the message NOTE: The X'X matrix has been found to be singular and a generalized inverse was used to solve the normal equations. Estimates followed by the letter 'B' are biased, and are not unique estimators of the parameters. and then worried that employing a generalized inverse as a solution might be dangerous. The fact is that the solution Stata implements can also be viewed (and is) a generalized-inverse solution. There are lots of "generalized" inverses of which dropping the variables is one. When are variables collinear? ----------------------------- In textbooks authors write about "perfectly collinear" variables, by which they mean the correlation between the two variables is exactly 1. For instance, the following two variables are perfectly collinear: x1 x2 1 2 2 4 3 6 In the real world of statistical computing things are seldom so clear cut. Computers work in binary and you think in decimal, meaning that when you input 6.1, the computer does not really store 6.1 exactly. From those numbers, user's generate calculated values, such as x2^2. On top of that, the finite-precision calculations subsequently may lead to round-off error. By the time the computer is studying the problem, calculation does not lead to clear-cut 0s and 1s, but to numbers like 1e-12 and .9999999999997. It is from numbers like those that the decision has to be made. How one makes that decision depends not only on the amount of numerical round-off error -- something one can analyze and have good knowledge about -- but also on the original accuracy to which the data were measured. Consider the following data: x1 x2 1 2 2 4 3 6.000000000001 Tell me that the measurement was made by a physicist in a certain context, and I might actually believe he or she measured 6.000000000001. The two sequences are not collinear and something very small is going on. Tell me they were made from economic data, and I will immediately suspect that the .000000000001 part is roundoff error from some earlier calculation. My point is that there is no right answer and so, in a few cases, it will not surprise me if Stata and SAS disagree. Changing when Stata declares variables collinear ------------------------------------------------ There is an undocumented way you can control when Stata will determine collinearity. In Stata-undocumented-speak it is called "tol 1" or, in the cases of -anova- and -manova-, "tol 2". "tol 1" affects how Stata inverts matrices in all cases except -anova- and -manova-. "tol 1" is irrelevant in the cases of -anova- and -manova-, and "tol 2" is the relevant parameter. The default values of these two parameters are tol 1 = 1.0e-9 tol 2 = 1.0e-8 You can reset them. If you make them smaller, Stata will be less likely to declare collinearity. Set them larger, and Stata will be more likely. I am about to tell you how to set them but I warn you, reset them and we wash our hands of you. Set the number too small, and you might cause Stata to crash. Set the number larger than that, but still too small, and in truly collinear cases, you can end up with estimates based on nothing but numerical round off error. Set the number too large, and Stata will drop variables left and right. I cannot tell you, however, that we hve set the numbers right. What I can tell you is that we have carefully considered the problem and that Stata now has a long history of using the numbers as we have set them, a history incorporating literally millions of matrix inversions, and users have seldom complained. That said, here's how you can reset tol 1 to 1.0e-6: . set debug on . set tol 1 1e-6 . set debug off You can reset tol 2 similarly. The -set tol- and -set debug- commands are undocumented, "secret" commands. -set tol- will not work unless Stata is in -debug- mode; this way, no one can accidently change these critical values. I recommend against running Stata in -debug- mode because some commands produce a lot of output that you do not want to see. To reset tol 1 back to its officially endorsed value, you could . set debug on . set tol 1 1e-9 . set debug off but I recommend simply exiting and relaunching Stata. That is to say, nothing you do resetting tol 1 or tol 2 will change Stata permanently. If you are trying to change the behavior of -anova- or -manova-, change "tol 2" rather than "tol 1". -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: Quantile regression with weights** - Next by Date:
**st: -linkplot- available on SSC** - Previous by thread:
**st: Re: Overriding dropping of collinear variables** - Next by thread:
**st: svytab** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |