# Re: st: Re: Comparison of results from Stata and Mata

 From Matthew J Baker To statalist@hsphsun2.harvard.edu Subject Re: st: Re: Comparison of results from Stata and Mata Date Sun, 6 Feb 2011 16:53:33 -0500 (EST)

Bill --

That is a most illuminating answer! One would think that I would
learn after all these years of being bitten by not putting in
"double" where I should (or for trying to take the natural log of
zero, for that matter)...

In any case: this begs the question: what is "reg" doing
differently? While running your code makes the errors much
closer, they still aren't identical - why is that?

Dr. Matthew J. Baker
Department of Economics
Hunter College and the Graduate Center, CUNY

---- Original message ----
>Date: Fri, 04 Feb 2011 10:23:27 -0600
>From: owner-statalist@hsphsun2.harvard.edu (on behalf of
wgould@stata.com (William Gould, StataCorp LP))
>Subject: st: Re: Comparison of results from Stata and Mata
>To: statalist@hsphsun2.harvard.edu
>
>Matthew Baker <matthew.baker@hunter.cuny.edu> wrote,
>
>> I have noticed that there can be a (small) discrepancy
between the
>> results produced by Stata's predict command after running a
regression
>> and what I think should generate identical results in Mata. As
an
>> illustration, when running the following code:
>>
>>       sysuse auto, clear
>>       reg price weight length
>>       predict xb
>>       gen err=price-xb
>>
>>     mata
>>       st_view(X=.,.,"weight length")
>>       X=X,J(rows(X),1,1)
>>       st_view(Y=.,.,"price")
>>       beta=invsym(X'X)*X'Y
>>       err_mata=Y-X*beta
>>     end
>>
>>     getmata err_mata
>>     sum err err_mata
>>
>> I find that the results of the sum command are as follows
[...]
>>
>>   Variable |  Obs        Mean   Std. Dev.       Min      Max
>> -----------+------------------------------------------------
>>        err |   74   -.0000198   2382.413  -4432.274  5771.01
>>   err_mata |   74    6.20e-06   2382.413  -4432.274  5771.01
>
>From the results above, it -err- from  Stata's -regress-
command is
>less accurate than -err_mata_ made on the basis of calculating
>syminv(X'X)*X'y.  This contradicts what I have written
previously,
>namely that calculation-wise one can easily improve on
>syminv(X'X)*X'y, and in fact -regress- does that.
>
>This is a really nice analysis of computation error.  It's nice,
but
>it suffers from an error -- a trivial one -- with big
consequences.
>Once that trivial error is fixed, the -summarize- table reads,
>
>
>    Variable |  Obs        Mean   Std. Dev.       Min      Max
>  -----------+-------------------------------------------------
>         err |   74    1.30e-12   2382.413  -4432.274  5771.01
>    err_mata |   74    8.17e-10   2382.413  -4432.274  5771.01
>
>Thus, once again, -regress- is more accurate than
syminv(X'X)*X'y.
>
>What happened is that Michael forget that Stata defaults to
storing
>new variables as floats whereas Mata stores everyting as
doubles.  F
>loats are less precise then doubles.  There are three lines in
>Michael's code that need fixing:
>
>        change                  to
>        --------------------------------------------
>        predict xb              predict double xb
>        gen err=price-xb        gen double err=price-xb
>        getmata err_mata        getmata err_mata, double
>
>So let's understand what happened.
>
>    1.  Michael used -reg price weight length- to obtain results.
>        Those results are fully accurate.
>
>    2.  Michael then coded -predict xb-, and so obtained a less
>        accurate recording of the prediction than he would have
>        gotten using -predict double xb-.  The recording he got
>        was not inaccurate; it was accurate to about 1 part in
>        10,000,000, but that is not sufficient when accessing
accuracy
>        that is is 1 part in 1,000,000,000,000 (1 quadrillion).
>
>    3.  Michael than coded -gen err=price-xb-.  Variable price is
>        recorded in full accuracy, but xb was rounded, and thus
so was
>        err.
>
>    4.  Put that all together, and the error appeared to be
>        -.0000198 when it really was 1.30e-12.
>
>    5.  On the Mata side, everthing went swimmingly until
Michael
>        exited Mata and coded -getmata err_mata-.  The full-
precision
>        -err_mata- variable in Mata was copied to a less than full
>        fedility Stata float variable -err_mata-.  That loss of
precision
>        changed the mean from 8.17e-10 to 6.20e-06.
>
>Still, it was an easy mistake to make.  More importantly, I
don't want
>anyone thinking that they need to use -double- for their own
work.
>Stata does all calculations in -double- regardless of how data
are
>stored.  Storing results as -float is more than sufficient for
most
>real-life work.  I am about to have a blog posting on exactly
that
>subject and, as a matter of fact, my current blog posting on
"How to
>read %21x format" at http://blog.stata.com/ is setting me up
>the subject next week.
>
>Meantime, here is the code to run if anyone wants to
reproduce the
>above results:
>
>-------------------------------------------
>sysuse auto, clear
>reg price weight length
>predict double xb
>gen double err = price - xb
>
>mata:
>st_view(X=., ., "weight length")
>X = X, J(rows(X), 1, 1)
>st_view(Y=., ., "price")
>beta = invsym(X'X)*X'Y
>err_mata = Y-X*beta
>end
>getmata err_mata, double
>sum err err_mata
>-------------------------------------------
>
>-- Bill
>wgould@stata.com
>*
>*   For searches and help try:
>*   http://www.stata.com/help.cgi?search
>*   http://www.stata.com/support/statalist/faq
>*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/