[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: faster xtiling
"Daniel Brodback" <firstname.lastname@example.org>
Re: st: faster xtiling
Fri, 07 Sep 2012 17:55:32 +0200
while I am no "learned" or experienced member of this list and have barely no Stata experience, I found that Judson Caskeys version xtileJ (you can find it at his page at http://personal.anderson.ucla.edu/judson.caskey/data.html ) gets the quantile job done far more efficiently. (We are talking minutes vs. hours)
Maybe you can use his version as starting point for something of your own. Comparing the quantile ranks of xtile and xtileJ my sample shows a correlation of .996.
-------- Original-Nachricht --------
> Datum: Fri, 7 Sep 2012 17:50:15 +0200
> Von: Maarten Buis <email@example.com>
> An: firstname.lastname@example.org
> Betreff: Re: st: faster xtiling
> On Fri, Sep 7, 2012 at 5:04 PM, László Sándor wrote:
> > I am trying to speed up -xtile- for Stata 11 and above for all
> > platforms (for internal use) used with tens of millions of
> > observations.
> > I checked the source of -xtile-, and I am not sure I understand all
> > its purpose. Most importantly, it does sort the data (a no-no with
> > data the size of mine), even though the crucial step of _pctile does
> > not need presorted data.
> The sorting only happens if you asked for more than 1,001 quantiles,
> so that suggests to me that there is some limitation in _pctile that
> makes that necessary. If it were just laziness/sloppiness than it
> would be extremely unlikely that the code would have been written that
> > And while I am at it, I am also happy to hear comments about the
> > prospects of using Mata for any of this. _pctile is built-in,
> > optimized, tailored, tweaked, polished C code, so there is little hope
> > that Mata might improve the crucial steps, right?
> As to the properties of -pctile, only StataCorp can say anything about
> that, as we cannot see its content any more than you can.
> -- Maarten
> Maarten L. Buis
> Reichpietschufer 50
> 10785 Berlin
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
* For searches and help try: