Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: where is StataCorp C code located? all in a single executable as compiled binary?


From   László Sándor <[email protected]>
To   [email protected]
Subject   Re: st: where is StataCorp C code located? all in a single executable as compiled binary?
Date   Tue, 20 Aug 2013 05:08:29 -0400

Thanks, Maarten.

My understanding of byable commands was that they loop over -if-
conditions anyway, though -in- conditions are supposed to be less
wasteful and would explain why the prefix requires sorted data.

Trust me, this code is heavily used on big data, if each run can save
us minutes, it is still worth it. And my current tests with maxing out
the code in this thread with -maxlong()- number of observations (the
limit) and thus 20 GB of data gives a 20-minute lead to -collapse-
over -tab, sum-. However, the key comparison is with the loops here,
and I did not catch that the test was biased in their favor as they
did not loop over all observations. I am rerunning those tests now.

On Tue, Aug 20, 2013 at 4:21 AM, Maarten Buis <[email protected]> wrote:
> On Mon, Aug 19, 2013 at 7:30 PM, László Sándor wrote:
>> The other option seemed to be to try to keep track of the levels of
>> "bins", and just forval loop over the values, if-ing in a bin at a
>> time to quickly grab the means. This was surprisingly fast, and does
>> not seem to be any slower without a sort beforehand. Again, I am not
>> sure any efficiency of -bys- looping of ifs does not seem to be worth
>> the cost of the initial sorting.
>
> I think you are mixing up advise here: -by: <something>- is likely to
> be faster than a -forvalues- loop combined with -if- conditions. I
> don't think anyone suggested that you sort before that loop. The logic
> is that an -if- condition will each time by necesisty have to go
> through all observations. The alternative would be a single sort with
> -in- conditions, which I guess is what is at the core of the speed of
> the -by- prefix. Depending on how many times you want to use -if-
> conditions, there will be a point where the combination of a single
> -sort- and many -in- conditions will be quicker than many -if-
> conditions. But I don't expect that -sort-ing will help if you choose
> the -forvalues- loop combined with -if- conditions.
>
> On a pragmatic level: how much time have you now spent trying to write
> this code, and how much time do you expect to safe with that? Are you
> sure that you don't end up with a nett loss of time?
>
> -- Maarten
>
> ---------------------------------
> Maarten L. Buis
> WZB
> Reichpietschufer 50
> 10785 Berlin
> Germany
>
> http://www.maartenbuis.nl
> ---------------------------------
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index