[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
wgould@stata.com (William Gould, Stata) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Efficient way to solve for coefficents and standard errors ? |

Date |
Mon, 28 Nov 2005 10:24:50 -0600 |

Thomas Cornelissen <cornelissen@mbox.iqw.uni-hannover.de> writes, > I want to implement a regression routine in Mata in order to use a large > data set. I have set up the system of moment equations A*beta = B. > I need to solve for beta and I need the covariance matrix of beta given by > Cov(beta)=A^(-1)*(u'u)/(n-k). I want to get its diagonal elements and I am > not interested in the off-diagonal elements. > > Because I am concerned by memory and time restrictions I am looking for the > most efficient way to solve for beta and for the standard errors. I have > considered: > > Way 1: Solve for beta -cholsolve(A,B)- and then obtain A^(-1) by > -cholinv(A)- > > I am troubled by the fact that I am so-to-speak inverting A twice > (i.e. using the solver twice). Isn't that inefficient? Accurracy wise, what Thomas just suggested is best, and as a matter of fact, is very nearly what -regress- does: 1. -regress- forms A as deviations from the mean. 2. -regress- uses function -invsym()- to invert A rather than -cholinv()- because -invym()- can handle collinearities. 3. If collinearities are found in step 2, then the appropropriate rows and columns from A are removed. 4. -regress- solves for b. 5. -regress- now does the work to estimate the intercept and adjust that VCE obtained in step 2. That is, -regress- "so-to-speak inverts A twice". The extra computational cost of using both a solver and an inverter is not much becuase the the inversion cost is a function of k (the number of variables in the model), not N (the number of observations), and the accuracy gain in the resulting estimated coefficients can be large. -regress- does two more things Thomas does not propose to do, and Thomas may want to consider adding one of them. In addition to "inverting twice", -regress- handles scale (steps 1 and 5) and it handles collinearity (uses -invsym() rather than -cholinv()- and step 3). The handling of collinearity can be safely skipped. Thomas' proposed code will simply fail when given a collinear model, and that is certainliy reasonable behavior. Concerning scale, my usual reply would be that taking deviations from the mean does imporove accuracy, but as long as the function is used with varaibles already normalized so that the means are not too widly different, that will not matter. Thomas said, however, that he is writting this function for use with a large dataset, and I suspect he means N is very large. The importance of mean removal grows with N, and somewhere around 500,000 or 1,000,000 observations, it becomes important even when the means are already roughly equal. Consider the sum of x, or the sum of x^2. Assume x has mean m. The contribution of the mean to the sum of x is N*m, and the contribution to the sum of x^2 is (N*m)^2 + Sum_i 2*m*(x_i-m). In both formulas, the important part is N*m. As N grows, the amount of precision required just to hold that part grows, and so leaves less and less precision for handling the wiggle part (the deviation from the mean). Yet, in the regression calculation, it is the wiggle that matters. Ergo, for N large, I would either remove the means in the function (Mata's accumulation routines can do that automatically), or I would only use my function on variables already normalized to have mean 0. -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: RE: RE: generating a new variable with the egen command** - Next by Date:
**st: Mata question.** - Previous by thread:
**st: Efficient way to solve for coefficents and standard errors ?** - Next by thread:
**st: Classification after mlogit** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |