Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: bechmarking a profile of an ado file


From   László Sándor <[email protected]>
To   [email protected]
Subject   st: bechmarking a profile of an ado file
Date   Thu, 19 Sep 2013 18:52:53 -0400

Hi all,

I tested our ado file producing binned scatter graphs. I am really
curious if seasoned Stata programmers find the following runtimes
reasonable, or they suggest something being wrong with our code.

I used -profiler- to see how much time various operations took. We
take two uniform random variables of length maxlong()/2, cut one of
them into 20 equal bins by using its quantiles, then calculate means
for both variables for each bin, and plot those against each other
alongside of the linear line from a regression of the original data,
all on 64 cores running StataMP 13.

The longest operations are by far the two -tabulate bins,
summarize(variable)- instances we need to run to generate means for
the bins. I might need to live with this bottleneck. In this case, all
this took 47 minutes, so nothing else would really matter.

That said, some other values I don't understand.

Generating the vingtiles weren't that bad, at least with our own
replacement for -xtile-: this completes in 1000 s. We even
-regress-ing one variable against the other took less than two
minutes. (Isn't -tab, sum- surprisingly slow compared to this?)

But relative to these, I don't understand how the following came even
anywhere close, esp. if these would not scale down with smaller data:

1. the sort within -serset- took another 3 minutes, which is too
little time to sort the entire data, but then I don't understand what
it did at all.
2.  _fr_runlog took another 2 minutes.
3. a "twoway__function_gen" took 84 s, a GenData 72 s, both within
"twoway__function_gen". If -twoway function- is joint with a
-scatteri- why doesn't it generate 20 datapoints? Did it generate 1
billion? If so, is there an "immediate" use of -twoway function-?
4. even -twoway__scatteri_serset- took almost a minute, even though we
gave it 20 pairs of values.
5. "_populate_stylenames" for "style" took 12 seconds?
6. a "parse" for "twowaygraph_g" took 14 seconds? How come?
7. another "parse" for a "twoway_function_parse" took another 10
seconds. Is this typical? Necessary?

Hopefully this discussion is useful for someone else, or future reference.

Thanks!

Laszlo
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index