Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | philippe van kerm <philippe.vankerm@ceps.lu> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: How to plot cdf after corrected kernel density |
Date | Fri, 4 Oct 2013 12:30:46 +0000 |
Nick is right: -akdensity- does nothing specific to address lower and upper bounds for data on a restricted range. I guess the short answer Monica may be looking for is to use -integ- instead of -cumul- (but see Nick's point about crude integration methods on smart PDF estimation) sysuse auto _kdens mpg, g(b a) integ b a , gen(cb) line cb a, sort Note that if interest is ultimately in the (smoothed) CDF, she requires a much smaller bandwidth than what would be 'optimal' for the PDF estimation. A transformation of the data may sometimes be a way to deal with boundary issues in kernel density estimation. Philippe > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner- > statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox > Sent: Friday, October 04, 2013 11:07 AM > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: How to plot cdf after corrected kernel density > > -akdensity- from Philippe Van Kerm (SJ) is an excellent command, but > I don't see options to respect lower and upper bounds, as Monica's > problem evidently requires. Philippe will correct me if I am wrong. > > However, her post does not dwell on this aspect and she uses an > accessible example (-mpg- in the auto dataset), for which this problem > does not bite. > > In practice, -akdensity- appears to produce estimates for the density > for a range wider than the observed data, so that might entail > projecting beyond the natural support of the data. > > The advice here depends a little on what the aim is, which could range > from just wanting a nicer graph for display (because you don't trust > the irregularities that are visible) to wanting numerical estimates > too for some later purpose. > > Clearly there is no such thing as "the" smoothed cdf, as it is easy to > think of several ways to get a cdf, either directly or indirectly. > > Also, for most purposes it would be expected that you might have to > explain how you got a smoothed cdf. In principle, naturally, the cdf > is just the integral of the pdf, but any method that is smart about > calculating the pdf but crude about integrating it may not be optimal. > > I am fond of kernel density methods and often use them, but their > emergence as a default or standard method seems a little accidental. > As they are essentially local methods, they don't place a high premium > (or indeed any at all) on global smoothness. For visualization they > can be a little conservative which is usually an excellent thing, as > researchers should always be on the lookout for quirky details of > their distributions. > > Other methods (including logspline density estimation) work well, but > on a quick search I can't find a Stata implementation. > > All that said, I still prefer estimating quantiles; it's really the > same problem, as graphically you are just exchanging axes. > > Nick > njcoxstata@gmail.com > > > On 3 October 2013 23:20, Alfonso S <oneiros_spain@yahoo.com> wrote: > > > I suggest you download the package akdensity (st0037_3). It does an > adaptive kernel density and generates the cdf variable as well. Use the > code below to check it out. > > > > sysuse auto > > akdensity mpg, g(a b) cdf(cb) > > line cb a > > > > Let me know if that is what you were looking for. > > From: Nick Cox <njcoxstata@gmail.com> > > > The bottom line in the post you cite advises > > > > "I prefer to get smoother cumulative distribution functions directly > from > > estimated quantiles." > > > > I agree with that. > > On 3 October 2013 21:45, Jain, Monica (HarvestPlus) <M.Jain@cgiar.org> > wrote: > > >> I am using -kdens- and I do not know how to plot the cumulative > distribution function. I am using Stata 13 for Windows. > >> > >> I am using -kdens- to estimate kernel density correcting for bounded > variables using linear combination method. I want to plot the > cumulative distribution function for the estimated kernel densities. On > one of the statlist threads > (http://www.stata.com/statalist/archive/2005-04/msg00798.html), the > following method has been suggested to plot them: > >> > >> sysuse auto > >> _kdens mpg, g(b a) > >> cumul b, g(cb) > >> line cb b, sort > >> > >> With the above command, I get the densities on the x-axis, rather > than the [x]. I looked all over the web to check if I can find how to > do it, but I have not been successful. If I use the following command: > >> > >> line cb a, sort > >> > >> I get weird triangle shaped graph. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/