Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: Comparison of results from Stata and Mata

From   Matthew J Baker <>
Subject   Re: st: Re: Comparison of results from Stata and Mata
Date   Sun, 6 Feb 2011 16:53:33 -0500 (EST)

Bill --

That is a most illuminating answer! One would think that I would 
learn after all these years of being bitten by not putting in 
"double" where I should (or for trying to take the natural log of 
zero, for that matter)...

In any case: this begs the question: what is "reg" doing 
differently? While running your code makes the errors much 
closer, they still aren't identical - why is that?

Dr. Matthew J. Baker
Department of Economics
Hunter College and the Graduate Center, CUNY

---- Original message ----
>Date: Fri, 04 Feb 2011 10:23:27 -0600
>From: (on behalf of (William Gould, StataCorp LP))
>Subject: st: Re: Comparison of results from Stata and Mata   
>Matthew Baker <> wrote, 
>> I have noticed that there can be a (small) discrepancy 
between the
>> results produced by Stata's predict command after running a 
>> and what I think should generate identical results in Mata. As 
>> illustration, when running the following code:
>>       sysuse auto, clear
>>       reg price weight length
>>       predict xb
>>       gen err=price-xb
>>     mata
>>       st_view(X=.,.,"weight length")
>>       X=X,J(rows(X),1,1)
>>       st_view(Y=.,.,"price")
>>       beta=invsym(X'X)*X'Y
>>       err_mata=Y-X*beta
>>     end
>>     getmata err_mata
>>     sum err err_mata
>> I find that the results of the sum command are as follows 
>>   Variable |  Obs        Mean   Std. Dev.       Min      Max
>> -----------+------------------------------------------------
>>        err |   74   -.0000198   2382.413  -4432.274  5771.01
>>   err_mata |   74    6.20e-06   2382.413  -4432.274  5771.01
>From the results above, it -err- from  Stata's -regress- 
command is 
>less accurate than -err_mata_ made on the basis of calculating 
>syminv(X'X)*X'y.  This contradicts what I have written 
>namely that calculation-wise one can easily improve on 
>syminv(X'X)*X'y, and in fact -regress- does that.
>This is a really nice analysis of computation error.  It's nice, 
>it suffers from an error -- a trivial one -- with big 
>Once that trivial error is fixed, the -summarize- table reads,
>    Variable |  Obs        Mean   Std. Dev.       Min      Max
>  -----------+-------------------------------------------------
>         err |   74    1.30e-12   2382.413  -4432.274  5771.01
>    err_mata |   74    8.17e-10   2382.413  -4432.274  5771.01
>Thus, once again, -regress- is more accurate than 
>What happened is that Michael forget that Stata defaults to 
>new variables as floats whereas Mata stores everyting as 
doubles.  F
>loats are less precise then doubles.  There are three lines in
>Michael's code that need fixing:
>        change                  to
>        --------------------------------------------
>        predict xb              predict double xb
>        gen err=price-xb        gen double err=price-xb
>        getmata err_mata        getmata err_mata, double
>So let's understand what happened. 
>    1.  Michael used -reg price weight length- to obtain results. 
>        Those results are fully accurate.
>    2.  Michael then coded -predict xb-, and so obtained a less 
>        accurate recording of the prediction than he would have 
>        gotten using -predict double xb-.  The recording he got 
>        was not inaccurate; it was accurate to about 1 part in 
>        10,000,000, but that is not sufficient when accessing 
>        that is is 1 part in 1,000,000,000,000 (1 quadrillion).
>    3.  Michael than coded -gen err=price-xb-.  Variable price is 
>        recorded in full accuracy, but xb was rounded, and thus 
so was 
>        err.
>    4.  Put that all together, and the error appeared to be 
>        -.0000198 when it really was 1.30e-12.
>    5.  On the Mata side, everthing went swimmingly until 
>        exited Mata and coded -getmata err_mata-.  The full-
>        -err_mata- variable in Mata was copied to a less than full 
>        fedility Stata float variable -err_mata-.  That loss of 
>        changed the mean from 8.17e-10 to 6.20e-06.
>Still, it was an easy mistake to make.  More importantly, I 
don't want
>anyone thinking that they need to use -double- for their own 
>Stata does all calculations in -double- regardless of how data 
>stored.  Storing results as -float is more than sufficient for 
>real-life work.  I am about to have a blog posting on exactly 
>subject and, as a matter of fact, my current blog posting on 
"How to
>read %21x format" at is setting me up 
to address
>the subject next week.
>Meantime, here is the code to run if anyone wants to 
reproduce the 
>above results:
>sysuse auto, clear 
>reg price weight length 
>predict double xb 
>gen double err = price - xb 
>st_view(X=., ., "weight length")
>X = X, J(rows(X), 1, 1)
>st_view(Y=., ., "price") 
>beta = invsym(X'X)*X'Y
>err_mata = Y-X*beta 
>getmata err_mata, double
>sum err err_mata
>-- Bill
>*   For searches and help try:
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index