Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Amy <dartmouthemails@yahoo.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: kdensity with few (/aggregated) data points |

Date |
Wed, 30 Jun 2010 22:12:43 -0700 (PDT) |

Thank you very much for your reply. To be sure I understand, -kdensity- and -twoway kdensity- only store the density for those 10 points and while the graphical display is different the stored values are the same so Stata considers them the same estimate even if they look different. If only 10 values are stored in either case, directly from the 10 points I have, there is no easy way, then, to sample from the density estimate so that I have imputed values for the 4th percentile, 6th percentile, etc.? It seems I cannot evaluate the density function at more points than I have in my sample. I realize that it would require a lot of strong assumptions to use a density function taken from so few points, but I believe this is regularly done in some (very narrow) areas. Thank you very much. --- On Wed, 6/30/10, Nick Cox <n.j.cox@durham.ac.uk> wrote: > From: Nick Cox <n.j.cox@durham.ac.uk> > Subject: st: RE: kdensity with few (/aggregated) data points > To: statalist@hsphsun2.harvard.edu > Date: Wednesday, June 30, 2010, 12:29 PM > I wouldn't read anything of > statistical substance into the differences. > > It looks as if -kdensity- and -twoway kdensity- have > different graphical > defaults for drawing the estimated density, one using > connected lines > and the other something smoother, in essence a cubic > spline. > > I agree with your implied puzzlement: it's not obvious why > that should > be so, but the difference is in any case a matter of > presentation. > > It's a real stretch to get a decent density function > estimate out of any > sample of the order of 10 observations, and no statistical > magic (white > or otherwise) can help much there. I think there is a > marginal advantage > to using -kdensity- directly and ignoring a histogram. > Binning of about > 10 points can hardly be anything but capricious and when > you have that > few there is no reason not to show all the raw data in > addition to any > density estimate. > > Nick > n.j.cox@durham.ac.uk > > > Amy > > I just thought to re-phrase my question. I've noticed that > if I have > very few data points (e.g. 10) then kdensity gives me > something jagged > even if I specify a Gaussian kernel (regardless of the > bandwidth). If > the reason I have so few data points is because I have > aggregate data, > e.g. data for each decile of a population, is there any way > to make this > smoother? Why is it that histogram X, bin(10) kdensity > kdenopts(gauss) > will give me something that looks smoother? > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: RE: kdensity with few (/aggregated) data points***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Next by Date:
**Re: st: Imputing, interpolating, or otherwise finding missing data?** - Next by thread:
**RE: st: RE: kdensity with few (/aggregated) data points** - Index(es):