[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: announcing statsbyfast - a faster statsby
Michael Blasnik <firstname.lastname@example.org> notes that a faster version of
-statsby- that uses -in- rather than -if-, -statsbyfast-, is now available.
> The new ado -statsbyfast- is now posted on SSC for downloading.
> statsbyfast is statsby with fewer than 10 lines of changed code and
> it works identically as far as I can tell.
I will note that two of us here at StataCorp had been following this thread
and considered changing -statsby- to use -in- for most cases. I haven't
thought carefully about Michal's solution, but a quick look shows that it is
different than what we were considering and much more ingenious.
It will, however, suffer from one potential problem that our solution also
suffered. If the command being executed for each group changes the sort order
of the data, then the results will become indeterminate, and generally not
what you want. This will never happen with official Stata estimators or for
the vast majority of Stata commands, because they are "sort stable". It is
incredibly easy for users to also make there commands sort stable, just add
the -sortpreserve-, option to the end of the -program XYZ- statement.
Even so, many old user-written commands will not be sort stable.
Since Michael is apparently much faster at responding to this than we are, he
might consider adding the "fix" we had in mind for the sorting problem. (We
were going to warn Michael that we might change -statsby-, but he was so fast
that before we finished considering our options he had posted a darn good
This will make perfect sense to Michael since he is already using most of
these tricks. First, keep the old PostGroups subprogram and rename it say
PostGroupsSlowly. After each run of the command check that the sort order is
what it was when you started. After the first run, this is probably a waste
of time, but it is a fast comparison of two, usually short, string macros. If
it is not, throw away everything and call -statsbyfast- again with a special
hidden option. When the hidden option is seen, reroute through
PostGroupsSlowly, rather than the now faster PostGroups.
We are very likely to add this speed up to -statsby-, but in our own good
* For searches and help try: