Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: kdensity with few (/aggregated) data points


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: kdensity with few (/aggregated) data points
Date   Thu, 1 Jul 2010 10:50:06 +0100

If quantile estimation is the real problem, then I think it is much
better to do it directly. Going via the density function is like going
from New York to Boston via DC. 

In addition to -centile-, -pctile-, etc., -hdquantile- from SSC
implements a method that has worked well for me. At the time of writing
it (2005) I did a bit of reading around, as reflected in these
references from -hdquantile.hlp-: 

Harrell, F.E. and C.E. Davis. 1982.  A new distribution-free quantile
        estimator.  Biometrika 69: 635-640.

Sheather, S.J. and J.S. Marron. 1990.  Kernel quantile estimators.
        Journal, American Statistical Association 85: 410-416.

Dielman, T.E., C. Lowry and R. Pfaffenberger.  1994. A comparison of
        quantile estimators.  Communications in Statistics - Simulation
and
        Computation 23: 355-371.

Hutson, A.D. and M.D. Ernst. 2000.  The exact bootstrap mean and
variance
        of an L-estimator.  Journal, Royal Statistical Society B 62:
89-94.

Ernst, M.D. and A.D. Hutson. 2003.  Utilizing a quantile function
        approach to obtain exact bootstrap solutions.  Statistical
Science
        18: 231-240.

If anyone knows of other important references in this territory, I would
be pleased to hear of them. Quantile estimation is neat stuff and will
be the subject of a future Speaking Stata column in the Stata Journal. 

Nick 
n.j.cox@durham.ac.uk 

Amy

Thank you very much for your reply. To be sure I understand, -kdensity-
and -twoway kdensity- only store the density for those 10 points and
while the graphical display is different the stored values are the same
so Stata considers them the same estimate even if they look different.
If only 10 values are stored in either case, directly from the 10 points
I have, there is no easy way, then, to sample from the density estimate
so that I have imputed values for the 4th percentile, 6th percentile,
etc.? It seems I cannot evaluate the density function at more points
than I have in my sample. I realize that it would require a lot of
strong assumptions to use a density function taken from so few points,
but I believe this is regularly done in some (very narrow) areas.

--- On Wed, 6/30/10, Nick Cox <n.j.cox@durham.ac.uk> wrote:

> I wouldn't read anything of
> statistical substance into the differences. 
> 
> It looks as if -kdensity- and -twoway kdensity- have
> different graphical
> defaults for drawing the estimated density, one using
> connected lines
> and the other something smoother, in essence a cubic
> spline. 
> 
> I agree with your implied puzzlement: it's not obvious why
> that should
> be so, but the difference is in any case a matter of
> presentation. 
> 
> It's a real stretch to get a decent density function
> estimate out of any
> sample of the order of 10 observations, and no statistical
> magic (white
> or otherwise) can help much there. I think there is a
> marginal advantage
> to using -kdensity- directly and ignoring a histogram.
> Binning of about
> 10 points can hardly be anything but capricious and when
> you have that
> few there is no reason not to show all the raw data in
> addition to any
> density estimate. 
 
Amy
 
> I just thought to re-phrase my question. I've noticed that
> if I have
> very few data points (e.g. 10) then kdensity gives me
> something jagged
> even if I specify a Gaussian kernel (regardless of the
> bandwidth). If
> the reason I have so few data points is because I have
> aggregate data,
> e.g. data for each decile of a population, is there any way
> to make this
> smoother? Why is it that histogram X, bin(10) kdensity
> kdenopts(gauss)
> will give me something that looks smoother?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index