Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: Comparison of results from Stata and Mata


From   Matthew J Baker <[email protected]>
To   [email protected]
Subject   Re: st: Re: Comparison of results from Stata and Mata
Date   Sun, 6 Feb 2011 16:53:33 -0500 (EST)

Bill --

That is a most illuminating answer! One would think that I would 
learn after all these years of being bitten by not putting in 
"double" where I should (or for trying to take the natural log of 
zero, for that matter)...

In any case: this begs the question: what is "reg" doing 
differently? While running your code makes the errors much 
closer, they still aren't identical - why is that?

Dr. Matthew J. Baker
Department of Economics
Hunter College and the Graduate Center, CUNY

---- Original message ----
>Date: Fri, 04 Feb 2011 10:23:27 -0600
>From: [email protected] (on behalf of 
[email protected] (William Gould, StataCorp LP))
>Subject: st: Re: Comparison of results from Stata and Mata   
>To: [email protected]
>
>Matthew Baker <[email protected]> wrote, 
>
>> I have noticed that there can be a (small) discrepancy 
between the
>> results produced by Stata's predict command after running a 
regression
>> and what I think should generate identical results in Mata. As 
an
>> illustration, when running the following code:
>>
>>       sysuse auto, clear
>>       reg price weight length
>>       predict xb
>>       gen err=price-xb
>>
>>     mata
>>       st_view(X=.,.,"weight length")
>>       X=X,J(rows(X),1,1)
>>       st_view(Y=.,.,"price")
>>       beta=invsym(X'X)*X'Y
>>       err_mata=Y-X*beta
>>     end
>>
>>     getmata err_mata
>>     sum err err_mata
>>
>> I find that the results of the sum command are as follows 
[...]
>>
>>   Variable |  Obs        Mean   Std. Dev.       Min      Max
>> -----------+------------------------------------------------
>>        err |   74   -.0000198   2382.413  -4432.274  5771.01
>>   err_mata |   74    6.20e-06   2382.413  -4432.274  5771.01
>
>From the results above, it -err- from  Stata's -regress- 
command is 
>less accurate than -err_mata_ made on the basis of calculating 
>syminv(X'X)*X'y.  This contradicts what I have written 
previously, 
>namely that calculation-wise one can easily improve on 
>syminv(X'X)*X'y, and in fact -regress- does that.
>
>This is a really nice analysis of computation error.  It's nice, 
but
>it suffers from an error -- a trivial one -- with big 
consequences.
>Once that trivial error is fixed, the -summarize- table reads,
>
>
>    Variable |  Obs        Mean   Std. Dev.       Min      Max
>  -----------+-------------------------------------------------
>         err |   74    1.30e-12   2382.413  -4432.274  5771.01
>    err_mata |   74    8.17e-10   2382.413  -4432.274  5771.01
>
>Thus, once again, -regress- is more accurate than 
syminv(X'X)*X'y.
>
>What happened is that Michael forget that Stata defaults to 
storing
>new variables as floats whereas Mata stores everyting as 
doubles.  F
>loats are less precise then doubles.  There are three lines in
>Michael's code that need fixing:
>
>        change                  to
>        --------------------------------------------
>        predict xb              predict double xb
>        gen err=price-xb        gen double err=price-xb
>        getmata err_mata        getmata err_mata, double
>
>So let's understand what happened. 
>
>    1.  Michael used -reg price weight length- to obtain results. 
>        Those results are fully accurate.
>
>    2.  Michael then coded -predict xb-, and so obtained a less 
>        accurate recording of the prediction than he would have 
>        gotten using -predict double xb-.  The recording he got 
>        was not inaccurate; it was accurate to about 1 part in 
>        10,000,000, but that is not sufficient when accessing 
accuracy 
>        that is is 1 part in 1,000,000,000,000 (1 quadrillion).
>
>    3.  Michael than coded -gen err=price-xb-.  Variable price is 
>        recorded in full accuracy, but xb was rounded, and thus 
so was 
>        err.
>
>    4.  Put that all together, and the error appeared to be 
>        -.0000198 when it really was 1.30e-12.
>
>    5.  On the Mata side, everthing went swimmingly until 
Michael 
>        exited Mata and coded -getmata err_mata-.  The full-
precision 
>        -err_mata- variable in Mata was copied to a less than full 
>        fedility Stata float variable -err_mata-.  That loss of 
precision
>        changed the mean from 8.17e-10 to 6.20e-06.
>
>Still, it was an easy mistake to make.  More importantly, I 
don't want
>anyone thinking that they need to use -double- for their own 
work.
>Stata does all calculations in -double- regardless of how data 
are
>stored.  Storing results as -float is more than sufficient for 
most
>real-life work.  I am about to have a blog posting on exactly 
that
>subject and, as a matter of fact, my current blog posting on 
"How to
>read %21x format" at http://blog.stata.com/ is setting me up 
to address
>the subject next week.
>
>Meantime, here is the code to run if anyone wants to 
reproduce the 
>above results:
>
>-------------------------------------------
>sysuse auto, clear 
>reg price weight length 
>predict double xb 
>gen double err = price - xb 
>
>mata:
>st_view(X=., ., "weight length")
>X = X, J(rows(X), 1, 1)
>st_view(Y=., ., "price") 
>beta = invsym(X'X)*X'Y
>err_mata = Y-X*beta 
>end
>getmata err_mata, double
>sum err err_mata
>-------------------------------------------
>
>-- Bill
>[email protected]
>*
>*   For searches and help try:
>*   http://www.stata.com/help.cgi?search
>*   http://www.stata.com/support/statalist/faq
>*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index