Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Re: Comparison of results from Stata and Mata
From
Matthew J Baker <[email protected]>
To
[email protected]
Subject
Re: st: Re: Comparison of results from Stata and Mata
Date
Sun, 6 Feb 2011 16:53:33 -0500 (EST)
Bill --
That is a most illuminating answer! One would think that I would
learn after all these years of being bitten by not putting in
"double" where I should (or for trying to take the natural log of
zero, for that matter)...
In any case: this begs the question: what is "reg" doing
differently? While running your code makes the errors much
closer, they still aren't identical - why is that?
Dr. Matthew J. Baker
Department of Economics
Hunter College and the Graduate Center, CUNY
---- Original message ----
>Date: Fri, 04 Feb 2011 10:23:27 -0600
>From: [email protected] (on behalf of
[email protected] (William Gould, StataCorp LP))
>Subject: st: Re: Comparison of results from Stata and Mata
>To: [email protected]
>
>Matthew Baker <[email protected]> wrote,
>
>> I have noticed that there can be a (small) discrepancy
between the
>> results produced by Stata's predict command after running a
regression
>> and what I think should generate identical results in Mata. As
an
>> illustration, when running the following code:
>>
>> sysuse auto, clear
>> reg price weight length
>> predict xb
>> gen err=price-xb
>>
>> mata
>> st_view(X=.,.,"weight length")
>> X=X,J(rows(X),1,1)
>> st_view(Y=.,.,"price")
>> beta=invsym(X'X)*X'Y
>> err_mata=Y-X*beta
>> end
>>
>> getmata err_mata
>> sum err err_mata
>>
>> I find that the results of the sum command are as follows
[...]
>>
>> Variable | Obs Mean Std. Dev. Min Max
>> -----------+------------------------------------------------
>> err | 74 -.0000198 2382.413 -4432.274 5771.01
>> err_mata | 74 6.20e-06 2382.413 -4432.274 5771.01
>
>From the results above, it -err- from Stata's -regress-
command is
>less accurate than -err_mata_ made on the basis of calculating
>syminv(X'X)*X'y. This contradicts what I have written
previously,
>namely that calculation-wise one can easily improve on
>syminv(X'X)*X'y, and in fact -regress- does that.
>
>This is a really nice analysis of computation error. It's nice,
but
>it suffers from an error -- a trivial one -- with big
consequences.
>Once that trivial error is fixed, the -summarize- table reads,
>
>
> Variable | Obs Mean Std. Dev. Min Max
> -----------+-------------------------------------------------
> err | 74 1.30e-12 2382.413 -4432.274 5771.01
> err_mata | 74 8.17e-10 2382.413 -4432.274 5771.01
>
>Thus, once again, -regress- is more accurate than
syminv(X'X)*X'y.
>
>What happened is that Michael forget that Stata defaults to
storing
>new variables as floats whereas Mata stores everyting as
doubles. F
>loats are less precise then doubles. There are three lines in
>Michael's code that need fixing:
>
> change to
> --------------------------------------------
> predict xb predict double xb
> gen err=price-xb gen double err=price-xb
> getmata err_mata getmata err_mata, double
>
>So let's understand what happened.
>
> 1. Michael used -reg price weight length- to obtain results.
> Those results are fully accurate.
>
> 2. Michael then coded -predict xb-, and so obtained a less
> accurate recording of the prediction than he would have
> gotten using -predict double xb-. The recording he got
> was not inaccurate; it was accurate to about 1 part in
> 10,000,000, but that is not sufficient when accessing
accuracy
> that is is 1 part in 1,000,000,000,000 (1 quadrillion).
>
> 3. Michael than coded -gen err=price-xb-. Variable price is
> recorded in full accuracy, but xb was rounded, and thus
so was
> err.
>
> 4. Put that all together, and the error appeared to be
> -.0000198 when it really was 1.30e-12.
>
> 5. On the Mata side, everthing went swimmingly until
Michael
> exited Mata and coded -getmata err_mata-. The full-
precision
> -err_mata- variable in Mata was copied to a less than full
> fedility Stata float variable -err_mata-. That loss of
precision
> changed the mean from 8.17e-10 to 6.20e-06.
>
>Still, it was an easy mistake to make. More importantly, I
don't want
>anyone thinking that they need to use -double- for their own
work.
>Stata does all calculations in -double- regardless of how data
are
>stored. Storing results as -float is more than sufficient for
most
>real-life work. I am about to have a blog posting on exactly
that
>subject and, as a matter of fact, my current blog posting on
"How to
>read %21x format" at http://blog.stata.com/ is setting me up
to address
>the subject next week.
>
>Meantime, here is the code to run if anyone wants to
reproduce the
>above results:
>
>-------------------------------------------
>sysuse auto, clear
>reg price weight length
>predict double xb
>gen double err = price - xb
>
>mata:
>st_view(X=., ., "weight length")
>X = X, J(rows(X), 1, 1)
>st_view(Y=., ., "price")
>beta = invsym(X'X)*X'Y
>err_mata = Y-X*beta
>end
>getmata err_mata, double
>sum err err_mata
>-------------------------------------------
>
>-- Bill
>[email protected]
>*
>* For searches and help try:
>* http://www.stata.com/help.cgi?search
>* http://www.stata.com/support/statalist/faq
>* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/