Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"William Gould, StataCorp LP" <wgould@stata.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: matrix operations in STATA and MATA do not match! |

Date |
Fri, 05 Apr 2013 10:05:25 -0500 |

Wu Zhang <wuzhang50@yahoo.com> writes, > I am trying to calculate the ordinary least square by matrix; the > formula is beta_hat=(X'*X)^(-1)*X'*Y > > In STATA, I run the code:Â matrix beta_hat= syminv(X'*X)*X'*Y > > In MATA, I run the code: beta_hat= luinv(X'*X)*X'*Y > > Interestingly,Â all the estimates of parameters of Xs are exactly > the same while the estimates for the constant are different. Could > you guys explain how it happens? I have a do-file to show you, but first, here are my comments, some of which are demonstrated in the do-file's results: 1. Mata's -luinv()- and Stata's -syminv()- are different inverters and produce numerically different answers. I will not go into the advantages and disadvantages of each here. 2. Mata provides other inverters, too. I will not go into the advantages and disadvantages of each of them here. 3. That Mata inverter that is identical to the Stata inverter -syminv()- is -invsym()-. Yes, I know the names are turned around. The Mata name -invsym()- is more logical than than the Stata name. -invsym()- is one way of calculating the inverse of a symmetric matrix. The Stata name is older. 4. Mata's -invsym()- and Stata's -invsym()- produce identical results. 5. The results produced by all functions are nearly identical in a simple example I provide, an example that is poorly scaled. The maximum relative error is 1.95e-13. 6. I say results are "nearly identical", but I should mention that, for a "simple" problem like this one, a relative difference of 1.95e-13 against *TRUTH* would never make it out the door at StataCorp. -regress- does a far more accurate job of producing regression estimates. 7. The problem is not with -syminv()-, -invsym()-, or -luinv()-. Each of those functions is doing exactly what it is designed to do, and accurately. The problem is that (X'X)^(-1)*X'y is a lousy way of calculating regression coefficients, numerically speaking. For well-scaled problems, however, (X'X)^(-1)*X'y can do a satisfactory and even an excellent job. 8. FYI, -regress- uses -syminv()- (equal to -invsym()-), but it also uses another inverter, too, and it uses a solver. That is, -regress- inverts the matrix three times in three different ways to obtain accurate answers! Inversion is only part of the problem with (X'X)^(-1)*X'y. The other problems are the calculations of of (X'X) and X'y. Mata provides functions to calculate these values accurately. See -help mata crossdev()- and -help quadcross()-. -regress- does the accumulation deviation form along the lines of crossdev(), and in some places, uses quad precision. I apologize to Lu if it seems like I'm coming down hard on him. I do appreciate that Lu was asking an innocent and reasonable question. I ask everyone, however, to be careful and not to write Subject lines like "matrix operations in STATA and MATA do not match!" unless they are certain they are right. On the other hand, by writing that subject line, Lu did get an answer. The following Stata log may intrest Lu: ----------------------------------------------------------------------- . * . * Step 1: post some of auto.dta to Mata . * . webuse auto (1978 Automobile Data) . drop if rep78==. (5 observations deleted) . tomata price mpg weight length turn foreign . . * . * Step 2: In Mata, calculate (X'X)^(-1)*X'y . * using invsum() and luinv(). . * Show estiamtes, difference, and relative difference . * . mata: ------------------------------------------------- mata (type end to exit) ----- : X = (mpg, weight, foreign, J(st_nobs(), 1, 1)) : y = price : b1 = invsym(X'X)*X'y : b2 = luinv(X'X)*X'y : b1, b2, b1-b2, reldif(b1,b2) 1 2 3 4 +-------------------------------------------------------------+ 1 | 34.34101721 34.34101721 -6.89937e-12 1.95223e-13 | 2 | 3.6062878 3.6062878 -4.70735e-14 1.02194e-14 | 3 | 3678.584438 3678.584438 -3.54703e-11 9.63976e-15 | 4 | -6639.010119 -6639.010119 3.43789e-10 5.17754e-14 | +-------------------------------------------------------------+ : /* > * > * Step 3: Post matrices X and y back to Stata > * Calculate (X'X)^(-1)*X'y in Stata. > * Post result back to Mata. > * Show difference in between b2 and bstata. > * They will be identical. > */ : st_matrix("X", X) : st_matrix("y", y) : end ------------------------------------------------------------------------------- . matrix bstata = syminv(X'*X)*X'*y . mata: ------------------------------------------------- mata (type end to exit) ----- : bstata = st_matrix("bstata") : b1, bstata, b1-bstata 1 2 3 +----------------------------------------------+ 1 | 34.34101721 34.34101721 0 | 2 | 3.6062878 3.6062878 0 | 3 | 3678.584438 3678.584438 0 | 4 | -6639.010119 -6639.010119 0 | +----------------------------------------------+ : end ----------------------------------------------------------------------- -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: matrix operations in STATA and MATA do not match!***From:*Wu Zhang <wuzhang50@yahoo.com>

- Prev by Date:
**[no subject]** - Next by Date:
**st: how to create a loop with variables starting with the same prefix** - Previous by thread:
**st: matrix operations in STATA and MATA do not match!** - Next by thread:
**Re: st: matrix operations in STATA and MATA do not match!** - Index(es):