Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: faster xtiling


From   László Sándor <[email protected]>
To   [email protected]
Subject   Re: st: faster xtiling
Date   Fri, 7 Sep 2012 12:16:55 -0400

Daniel, Maarten, thank you very much.

To start with Maarten's point, I'm sad to say I don't see why Stata
would skip the sort that is in stile's Makequan subroutine. Maybe I
overlooked something. But this can be the source of Caskey's speedup
too?

Otherwise, Caskey's extensive use of -egen- would be very surprising
to beat a well-written -_pctile- in C-code. Strange.

And I am also worried about Caskey not getting exactly the same
number. -xtile- without the sort still would?

I am not in the business of accusing of StataCorp with sloppiness or
laziness. Maybe there is some arbitrariness in what _pctile needs to
do so to make it reproducible (with ties?), they need a preceding
sort? Not worth worrying about in my case, I think, so I'd just drop
the sort.

Thanks!

Laszlo

On Fri, Sep 7, 2012 at 11:55 AM, Daniel Brodback <[email protected]> wrote:
>
> László,
>
> while I am no "learned" or experienced member of this list and have barely
> no Stata experience, I found that Judson Caskeys version xtileJ (you can
> find it at his page at
> http://personal.anderson.ucla.edu/judson.caskey/data.html ) gets the
> quantile job done far more efficiently. (We are talking minutes vs. hours)
>
> Maybe you can use his version as starting point for something of your own.
> Comparing the quantile ranks of xtile and xtileJ my sample shows a
> correlation of .996.
>
> HTH,
> Daniel
>
>
> -------- Original-Nachricht --------
> > Datum: Fri, 7 Sep 2012 17:50:15 +0200
> > Von: Maarten Buis <[email protected]>
> > An: [email protected]
> > Betreff: Re: st: faster xtiling
>
> > On Fri, Sep 7, 2012 at 5:04 PM, László Sándor wrote:
> > > I am trying to speed up -xtile- for Stata 11 and above for all
> > > platforms (for internal use) used with tens of millions of
> > > observations.
> > >
> > > I checked the source of -xtile-, and I am not sure I understand all
> > > its purpose. Most importantly, it does sort the data (a no-no with
> > > data the size of mine), even though the crucial step of _pctile does
> > > not need presorted data.
> >
> > The sorting only happens if you asked for more than 1,001 quantiles,
> > so that suggests to me that there is some limitation in _pctile that
> > makes that necessary. If it were just laziness/sloppiness than it
> > would be extremely unlikely that the code would have been written that
> > way.
> >
> > > And while I am at it, I am also happy to hear comments about the
> > > prospects of using Mata for any of this. _pctile is built-in,
> > > optimized, tailored, tweaked, polished C code, so there is little hope
> > > that Mata might improve the crucial steps, right?
> >
> > As to the properties of -pctile, only StataCorp can say anything about
> > that, as we cannot see its content any more than you can.
> >
> > -- Maarten
> >
> > ---------------------------------
> > Maarten L. Buis
> > WZB
> > Reichpietschufer 50
> > 10785 Berlin
> > Germany
> >
> > http://www.maartenbuis.nl
> > ---------------------------------
> >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index