Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: kdensity with few (/aggregated) data points


From   Amy <[email protected]>
To   [email protected]
Subject   Re: st: RE: kdensity with few (/aggregated) data points
Date   Wed, 30 Jun 2010 22:12:43 -0700 (PDT)

Thank you very much for your reply. To be sure I understand, -kdensity- and -twoway kdensity- only store the density for those 10 points and while the graphical display is different the stored values are the same so Stata considers them the same estimate even if they look different. If only 10 values are stored in either case, directly from the 10 points I have, there is no easy way, then, to sample from the density estimate so that I have imputed values for the 4th percentile, 6th percentile, etc.? It seems I cannot evaluate the density function at more points than I have in my sample. I realize that it would require a lot of strong assumptions to use a density function taken from so few points, but I believe this is regularly done in some (very narrow) areas.

Thank you very much.



--- On Wed, 6/30/10, Nick Cox <[email protected]> wrote:

> From: Nick Cox <[email protected]>
> Subject: st: RE: kdensity with few (/aggregated) data points
> To: [email protected]
> Date: Wednesday, June 30, 2010, 12:29 PM
> I wouldn't read anything of
> statistical substance into the differences. 
> 
> It looks as if -kdensity- and -twoway kdensity- have
> different graphical
> defaults for drawing the estimated density, one using
> connected lines
> and the other something smoother, in essence a cubic
> spline. 
> 
> I agree with your implied puzzlement: it's not obvious why
> that should
> be so, but the difference is in any case a matter of
> presentation. 
> 
> It's a real stretch to get a decent density function
> estimate out of any
> sample of the order of 10 observations, and no statistical
> magic (white
> or otherwise) can help much there. I think there is a
> marginal advantage
> to using -kdensity- directly and ignoring a histogram.
> Binning of about
> 10 points can hardly be anything but capricious and when
> you have that
> few there is no reason not to show all the raw data in
> addition to any
> density estimate. 
> 
> Nick 
> [email protected]
> 
> 
> Amy
> 
> I just thought to re-phrase my question. I've noticed that
> if I have
> very few data points (e.g. 10) then kdensity gives me
> something jagged
> even if I specify a Gaussian kernel (regardless of the
> bandwidth). If
> the reason I have so few data points is because I have
> aggregate data,
> e.g. data for each decile of a population, is there any way
> to make this
> smoother? Why is it that histogram X, bin(10) kdensity
> kdenopts(gauss)
> will give me something that looks smoother?
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index