Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: RE: kdensity with few (/aggregated) data points |

Date |
Thu, 1 Jul 2010 10:50:06 +0100 |

If quantile estimation is the real problem, then I think it is much better to do it directly. Going via the density function is like going from New York to Boston via DC. In addition to -centile-, -pctile-, etc., -hdquantile- from SSC implements a method that has worked well for me. At the time of writing it (2005) I did a bit of reading around, as reflected in these references from -hdquantile.hlp-: Harrell, F.E. and C.E. Davis. 1982. A new distribution-free quantile estimator. Biometrika 69: 635-640. Sheather, S.J. and J.S. Marron. 1990. Kernel quantile estimators. Journal, American Statistical Association 85: 410-416. Dielman, T.E., C. Lowry and R. Pfaffenberger. 1994. A comparison of quantile estimators. Communications in Statistics - Simulation and Computation 23: 355-371. Hutson, A.D. and M.D. Ernst. 2000. The exact bootstrap mean and variance of an L-estimator. Journal, Royal Statistical Society B 62: 89-94. Ernst, M.D. and A.D. Hutson. 2003. Utilizing a quantile function approach to obtain exact bootstrap solutions. Statistical Science 18: 231-240. If anyone knows of other important references in this territory, I would be pleased to hear of them. Quantile estimation is neat stuff and will be the subject of a future Speaking Stata column in the Stata Journal. Nick n.j.cox@durham.ac.uk Amy Thank you very much for your reply. To be sure I understand, -kdensity- and -twoway kdensity- only store the density for those 10 points and while the graphical display is different the stored values are the same so Stata considers them the same estimate even if they look different. If only 10 values are stored in either case, directly from the 10 points I have, there is no easy way, then, to sample from the density estimate so that I have imputed values for the 4th percentile, 6th percentile, etc.? It seems I cannot evaluate the density function at more points than I have in my sample. I realize that it would require a lot of strong assumptions to use a density function taken from so few points, but I believe this is regularly done in some (very narrow) areas. --- On Wed, 6/30/10, Nick Cox <n.j.cox@durham.ac.uk> wrote: > I wouldn't read anything of > statistical substance into the differences. > > It looks as if -kdensity- and -twoway kdensity- have > different graphical > defaults for drawing the estimated density, one using > connected lines > and the other something smoother, in essence a cubic > spline. > > I agree with your implied puzzlement: it's not obvious why > that should > be so, but the difference is in any case a matter of > presentation. > > It's a real stretch to get a decent density function > estimate out of any > sample of the order of 10 observations, and no statistical > magic (white > or otherwise) can help much there. I think there is a > marginal advantage > to using -kdensity- directly and ignoring a histogram. > Binning of about > 10 points can hardly be anything but capricious and when > you have that > few there is no reason not to show all the raw data in > addition to any > density estimate. Amy > I just thought to re-phrase my question. I've noticed that > if I have > very few data points (e.g. 10) then kdensity gives me > something jagged > even if I specify a Gaussian kernel (regardless of the > bandwidth). If > the reason I have so few data points is because I have > aggregate data, > e.g. data for each decile of a population, is there any way > to make this > smoother? Why is it that histogram X, bin(10) kdensity > kdenopts(gauss) > will give me something that looks smoother? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: RE: kdensity with few (/aggregated) data points***From:*Amy <dartmouthemails@yahoo.com>

- Prev by Date:
**st: Basic notation for discrete times unevenly spaced** - Next by Date:
**RE: st: RE: RE: replace blank string values** - Previous by thread:
**Re: st: RE: kdensity with few (/aggregated) data points** - Next by thread:
**Re: st: Imputing, interpolating, or otherwise finding missing data?** - Index(es):