Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: approximate quantiles in Stata


From   Joe Canner <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: approximate quantiles in Stata
Date   Sat, 24 Aug 2013 20:31:43 +0000

Laszlo,

My guess is that Stata has already optimized -sort- about as much as it can be optimized.  The algorithm you linked to earlier appeared to be more geared at calculating quantiles in real time (in a data stream) rather than in a static sample.  Given that a Stata data set is already in memory (and getting it into memory takes a significant amount of time in and of itself for a large data set), I doubt if that algorithm is going to buy you anything above and beyond using the standard Stata sort.

This is an intriguing problem, though, one which I would probably benefit from as well (we typically process datasets with millions of observations as well), so I will keep my eyes open for quantile algorithms.  Keep us posted if you find anything interesting that could easily be turned into a user-written algorithm,

Regards,
Joe Canner
Johns Hopkins University School of Medicine
________________________________________
From: [email protected] [[email protected]] on behalf of László Sándor [[email protected]]
Sent: Saturday, August 24, 2013 12:03 PM
To: [email protected]
Subject: Re: st: approximate quantiles in Stata

Also, I received a quick comment from theoretical CS people that the
quantile thing could be faster than the quoted speed of sorting in
Stata (and xtile does sort): "It is possible to find the quantiles in
O(N) using a straightforward modification of the selection algorithm
based on quicksort (see
http://en.wikipedia.org/wiki/Selection_algorithm)."
http://cstheory.stackexchange.com/questions/18732/algorithms-to-split-data-into-roughly-equal-sized-quantiles/

Is there anything to know about QuickSort in Stata?

That said, I cannot write up a fancy algorithm to find the quantiles,
let alone a whole new -sort-, so probably the question is still about
sensible downsampling.

On Sat, Aug 24, 2013 at 7:42 AM, László Sándor <[email protected]> wrote:
> Thanks, David,
>
> I think the typical use case is about tens of millions of
> observations. (And as I think it matters for precision, the typical
> case is about 20 bins, or vingtiles.)
>
> FWIW, I also tried to profile -xtile- with maximum number of
> observations possible. With Stata 13 running on 64 cores, it took 7
> hours to generate vingtiles.
>
> Laszlo
>
> On Fri, Aug 23, 2013 at 9:54 PM, David Hoaglin <[email protected]> wrote:
>> Hi, Laszlo.
>>
>> How large are your samples, and which quantiles do you need?
>>
>> I think I saw some relevant work a number of years ago, and I will
>> have to look for it.
>>
>> David Hoaglin
>>
>> On Fri, Aug 23, 2013 at 9:01 PM, László Sándor <[email protected]> wrote:
>>> Hi,
>>> My work is slowed down by the precise but computationally intensive
>>> quantile calculation of Stata. I am curious if there are any
>>> approximation algorithms implemented out there, something along these
>>> ideas: http://www.prelert.com/blog/q-digest-an-algorithm-for-computing-approximate-quantiles-on-a-collection-of-integers/
>>>
>>> So this is not about estimating population quantiles from a small
>>> sample (see Nick's hdquantile on SSC, e.g.). This is about finding
>>> approximate quantiles in large data.
>>>
>>> If the answer is simply random downsampling before taking quantiles, I
>>> would still appreciate some guidance on how heavily to downsample as a
>>> function of population size.
>>>
>>> Thanks!
>>>
>>> Laszlo
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index