Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: kdensity with few (/aggregated) data points

From   "Nick Cox" <>
To   <>
Subject   st: RE: kdensity with few (/aggregated) data points
Date   Wed, 30 Jun 2010 17:29:14 +0100

I wouldn't read anything of statistical substance into the differences. 

It looks as if -kdensity- and -twoway kdensity- have different graphical
defaults for drawing the estimated density, one using connected lines
and the other something smoother, in essence a cubic spline. 

I agree with your implied puzzlement: it's not obvious why that should
be so, but the difference is in any case a matter of presentation. 

It's a real stretch to get a decent density function estimate out of any
sample of the order of 10 observations, and no statistical magic (white
or otherwise) can help much there. I think there is a marginal advantage
to using -kdensity- directly and ignoring a histogram. Binning of about
10 points can hardly be anything but capricious and when you have that
few there is no reason not to show all the raw data in addition to any
density estimate. 



I just thought to re-phrase my question. I've noticed that if I have
very few data points (e.g. 10) then kdensity gives me something jagged
even if I specify a Gaussian kernel (regardless of the bandwidth). If
the reason I have so few data points is because I have aggregate data,
e.g. data for each decile of a population, is there any way to make this
smoother? Why is it that histogram X, bin(10) kdensity kdenopts(gauss)
will give me something that looks smoother?

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index