Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: approximate quantiles in Stata


From   David Hoaglin <[email protected]>
To   [email protected]
Subject   Re: st: approximate quantiles in Stata
Date   Sun, 25 Aug 2013 16:54:08 -0400

Laszlo,

You're welcome.

The comments about the quality of the sample seem rather vague.  I
didn't dig for a specific measure of "quality."  Working with an
incoming stream of data makes the problem more challenging.  You're
fortunate to have the entire "population" already.

If the estimate of a quantile is to have a specified variance, the
necessary sample size will be larger for more-extreme quantiles.

You have not explained what you plan to do with your 20 bins.  A
sample of suitable size would give you estimates of the boundaries of
the bins (i.e., the 19 quantiles).  Then a single pass over the
population would give you the exact number of data values in each bin.

David Hoaglin

On Sun, Aug 25, 2013 at 11:26 AM, László Sándor <[email protected]> wrote:
> Thanks, David.
>
> I think I found a reference about quantiles from downsampling, only
> with a little clarification needed. I think I see the point about why
> the size of the sample matters and not the sampling rate. See the
> discussion in this CStheory answer:
> http://cstheory.stackexchange.com/a/18734/17375
>
> Or staying closer to our stats brethren, I edited my question on Cross
> Validate with my current concerns:
> http://stats.stackexchange.com/questions/68208/how-should-sampling-ratios-to-estimate-quantiles-change-with-population-size
>
> On the point about the tails: I think equal-sized bins provide a good
> summary of a distribution, or binned scatter plots of correlations.
> Unequal bins can confuse people just as much about about the unequal
> precision of means of the bins (true, it is not only the bin size that
> drives it, the tails will have larger standard deviations).
>
> But thanks!

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index