[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: RE: st: stata 10 mp2 vs stata9
email@example.com (Vince Wiggins, StataCorp)
Re: RE: st: stata 10 mp2 vs stata9
Sat, 25 Aug 2007 10:21:27 -0500
Jitian Sheu <firstname.lastname@example.org> sends more information about his timings
between Stata 9 and Stata 10/MP.
His "program" is two -generate- statements with a 22 string comparisons.
> . gen byte plavix_tmp=(order_code=="B022932100")
> . gen byte ticlo_tmp=(order_code=="A031596100" |
> order_code=="A033091100" |
> order_code=="A028034100" |
> order_code=="A033675100" |
> order_code=="B018857100" )
And, he is running it on a 1.75 GB dataset. Jitian has now run his timings
using different amounts of memory allocated to Stata. His timings comparing
64-bit Stata 9 and Stata 64-bit Stata 10/MP are:
Memory Stata 9 Stata 10
------ ------- --------
2.2 66.454 N/A
2.4 69.063 57.969
2.6 73.094 49.719
2.8 77.844 54.735
3.0 81.500 58.781
In the email that started this thread, Jitian said, "Stata 10 is actually 1.5
times slower that Stata 9". I have checked Jitian's latest email 3 times now,
and I admit that perhaps I am a little slow on this Saturday morning, but it
looks to me like all of the Stata 10 runtimes are faster than any of the Stata
9 runtimes. Stata 10/MP looks to be between 14% and 33% faster than Stata 9
on Jitian's program.
I am guessing that Jitian is comparing Stata 10/MP2 to Stata 10/SE, though he
has never said whether his Stata 9 is MP or not. Others at StataCorp have
been guessing that he is running MP for both Stata 9 and Stata 10.
These timings suggest to me that Jitian is indeed running Stata 9/SE and Stata
10/MP2 and that the improved speed under Stata 10 is because Stata 10 is using
both of hist machine's processors/cores when executing the -generate-
Jitian also wonders why Stata 10 needs more memory than Stata 9 for the same
problem. Recall that Stata 10 has a new file format, and that is largely to
support the new time-and-date format variables. This new storage format
requires slightly more space to store Jitian's data in memory.
As Jitian notes, the runtimes are dependent on the amount of memory allocated
to Stata. This does not surprise me as much as it does Jitian. Jitian's
program spends most of its time running through the dataset, because his
commands will not take long to run. This means that the computer will be
spending most of its time moving data from pretty fast normal memory to
incredibly fast cache memory that is directly on the processor chip. As we
change the amount of memory allocated to Stata, Jitian's data is distributed
differently in memory and the caching process may be more or less efficient.
This also explains why Stata 10/MP is only 14% to 33% faster, when normally I
would expect 50-100% faster. Because Jitian's computations are fairly fast,
particularly the first -generate- statement, most of the computer's time is
spent pulling data from regular memory to cache memory, rather than processing
If I'm wrong about Jitian using MP Stata 10 and SE under Stata 9, then the
cache effect could still explain the speed differences. Though if this is the
case, it is curious that Stata 10 is so much faster than Stata 9.
With modern computers, timings can be heavily influenced by these cache
effects. Computationally intense process, where heavy-duty computations are
done on each record are less affected by such caching, but Jitian's generate
statements are not really heavy-duty. Though it makes some timing comparisons
difficult, you do not want chip makers to stop this trend toward large on-chip
caches. Much of computer's speed increases in recent years can be attributed
to improved use of cache and not to higher processor clock speeds.
* For searches and help try: