Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[no subject]

This is a really nice analysis of computation error.  It's nice, but
it suffers from an error -- a trivial one -- with big consequences.
Once that trivial error is fixed, the -summarize- table reads,


    Variable |  Obs        Mean   Std. Dev.       Min      Max
  -----------+-------------------------------------------------
         err |   74    1.30e-12   2382.413  -4432.274  5771.01
    err_mata |   74    8.17e-10   2382.413  -4432.274  5771.01

Thus, once again, -regress- is more accurate than syminv(X'X)*X'y.

What happened is that Michael forget that Stata defaults to storing
new variables as floats whereas Mata stores everyting as doubles.  F
loats are less precise then doubles.  There are three lines in
Michael's code that need fixing:

        change                  to
        --------------------------------------------
        predict xb              predict double xb
        gen err=price-xb        gen double err=price-xb
        getmata err_mata        getmata err_mata, double

So let's understand what happened. 

    1.  Michael used -reg price weight length- to obtain results. 
        Those results are fully accurate.

    2.  Michael then coded -predict xb-, and so obtained a less 
        accurate recording of the prediction than he would have 
        gotten using -predict double xb-.  The recording he got 
        was not inaccurate; it was accurate to about 1 part in 
        10,000,000, but that is not sufficient when accessing accuracy 
        that is is 1 part in 1,000,000,000,000 (1 quadrillion).

    3.  Michael than coded -gen err=price-xb-.  Variable price is 
        recorded in full accuracy, but xb was rounded, and thus so was 
        err.

    4.  Put that all together, and the error appeared to be 
        -.0000198 when it really was 1.30e-12.

    5.  On the Mata side, everthing went swimmingly until Michael 
        exited Mata and coded -getmata err_mata-.  The full-precision 
        -err_mata- variable in Mata was copied to a less than full 
        fedility Stata float variable -err_mata-.  That loss of precision
        changed the mean from 8.17e-10 to 6.20e-06.

Still, it was an easy mistake to make.  More importantly, I don't want
anyone thinking that they need to use -double- for their own work.
Stata does all calculations in -double- regardless of how data are
stored.  Storing results as -float is more than sufficient for most
real-life work.  I am about to have a blog posting on exactly that
subject and, as a matter of fact, my current blog posting on "How to
read %21x format" at http://blog.stata.com/ is setting me up to address
the subject next week.

Meantime, here is the code to run if anyone wants to reproduce the 
above results:

-------------------------------------------
sysuse auto, clear 
reg price weight length 
predict double xb 
gen double err = price - xb 

mata:
st_view(X=., ., "weight length")
X = X, J(rows(X), 1, 1)
st_view(Y=., ., "price") 
beta = invsym(X'X)*X'Y
err_mata = Y-X*beta 
end
getmata err_mata, double
sum err err_mata
-------------------------------------------

-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: Re: Comparison of results from Stata and Mata
Next by Date: RE: st: SPLINE commands
Previous by thread: st: Re: Comparison of results from Stata and Mata
Next by thread: st: update proprcspline available from SSC
Index(es):
- Date
- Thread