Ben Jann <ben.jann@gmail.com> asks,
> A 49% reduction in run time still makes Mata 20 times slower than the
> plugin. Can you explain to us where the performance gets lost?
Not yet. My first guess was it had to do with subscripting, but I ran
tests this morning and dismissed that:
Statement execution time compared to (1)
----------------------------------------------------------
(1) x = a 1.06e-06 secs
(2) x = a[k] 1.19e-06 +0.13e-06 secs
(3) x = a[k,m] 1.23e-06 +0.17e-06
(4) x[k] = a 1.07e-06 +0.01e-06
(5) x[k,m] = a 1.12e-06 +0.06e-06
----------------------------------------------------------
The results above suggest access of subscripted values could be sped
up (compare (2) to (4) and (3) to (5)), and I'll look into that, but the total
contribution will not be much.
I already know that the -for (i=1; i<n; i++)- is much faster in C
than in Mata. I haven't done the timings recently, but my recollection
is a factor of 100 or perhaps 200. That's due to C's use of 4-byte
integers rather than 8-byte doubles for the counters. Adding 4-byte
integers to Mata is on the development list, but not high on it.
Factors like 100 and 200 sound gigantic, but in this case, they do not
amount to much. Both Mata and C spend little time executing
the construct, not because it rarely happens, but because the construct is
already so fast.
That's the problem with relative timings. One can find loops where
the relative differences are large and absolute differences small.
Until Mata translates all the way down to machine language -- and that is
*NOT* in the current plans but is theoretically possible -- Mata cannot
achieve the timings of C on a statement-by-statement basis. There will
always be overhead. In the scientific applications for which C is
designed, however, that may not matter. It does not matter if most of
the execution time is going into library functions such as matrix inversion,
etc., and those functions run the same speed in both environments.
All of which is to say, one has to be careful looking at individual statements
or you will spend all you time getting big relative improvements that, when
you add them all up, amount to little. There is far more gain from finding
that statement with an overhead rate of 4 that consumes lots of time than the
statement with an overhead rate of 200 that consumes little.
In the case of the code Brendan Halpin <brendan.halpin@ul.ie> reported,
execution time was .0228 seconds after improvement. That was with 100x100
matrices and I wonder how large the matrices were that Brendan was using. In
any case, 20 times faster reduces the run time to 0.00114, a savings of .02166
seconds. That absolute number, while not large, is big enough to catch my
interest, which is why I've been timing indidvidual statements this morning.
-- Bill
wgould@stata.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/