Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: How to plot cdf after corrected kernel density

From   philippe van kerm <>
To   "" <>
Subject   RE: st: How to plot cdf after corrected kernel density
Date   Fri, 4 Oct 2013 12:30:46 +0000

Nick is right: -akdensity- does nothing specific to address lower and upper bounds for data on a restricted range. 

I guess the short answer Monica may be looking for is to use -integ- instead of -cumul- (but see Nick's point about crude integration methods on smart PDF estimation)

  sysuse auto
  _kdens mpg, g(b a)  
  integ b a , gen(cb)
  line cb a, sort

Note that if interest is ultimately in the (smoothed) CDF, she requires a much smaller bandwidth than what would be 'optimal' for the PDF estimation. 

A transformation of the data may sometimes be a way to deal with boundary issues in kernel density estimation.


> -----Original Message-----
> From: [mailto:owner-
>] On Behalf Of Nick Cox
> Sent: Friday, October 04, 2013 11:07 AM
> To:
> Subject: Re: st: How to plot cdf after corrected kernel density
> -akdensity- from  Philippe Van Kerm  (SJ) is an excellent command, but
> I don't see options to respect lower and upper bounds, as Monica's
> problem evidently requires. Philippe will correct me if I am wrong.
> However, her post does not dwell on this aspect and she uses an
> accessible example (-mpg- in the auto dataset), for which this problem
> does not bite.
> In practice, -akdensity- appears to produce estimates for the density
> for a range wider than the observed data, so that might entail
> projecting beyond the natural support of the data.
> The advice here depends a little on what the aim is, which could range
> from just wanting a nicer graph for display (because you don't trust
> the irregularities that are visible) to wanting numerical estimates
> too for some later purpose.
> Clearly there is no such thing as "the" smoothed cdf, as it is easy to
> think of several ways to get a cdf, either directly or indirectly.
> Also, for most purposes it would be expected that you might have to
> explain how you got a smoothed cdf. In principle, naturally, the cdf
> is just the integral of the pdf, but any method that is smart about
> calculating the pdf but crude about integrating it may not be optimal.
> I am fond of kernel density methods and often use them, but their
> emergence as a default or standard method seems a little accidental.
> As they are essentially local methods, they don't place a high premium
> (or indeed any at all) on global smoothness. For visualization they
> can be a little conservative which is usually an excellent thing, as
> researchers should always be on the lookout for quirky details of
> their distributions.
> Other methods (including logspline density estimation) work well, but
> on a quick search I can't find a Stata implementation.
> All that said, I still prefer estimating quantiles; it's really the
> same problem, as graphically you are just exchanging axes.
> Nick
> On 3 October 2013 23:20, Alfonso S <> wrote:
> > I suggest you download the package akdensity (st0037_3). It does an
> adaptive kernel density and generates the cdf variable as well. Use the
> code below to check it out.
> >
> > sysuse auto
> > akdensity mpg, g(a b) cdf(cb)
> > line cb a
> >
> > Let me know if that is what you were looking for.
> From: Nick Cox <>
> > The bottom line in the post you cite advises
> >
> > "I prefer to get smoother cumulative distribution functions directly
> from
> > estimated quantiles."
> >
> > I agree with that.
> On 3 October 2013 21:45, Jain, Monica (HarvestPlus) <>
> wrote:
> >> I am using -kdens- and I do not know how to plot the cumulative
> distribution function. I am using Stata 13 for Windows.
> >>
> >> I am using -kdens- to estimate kernel density correcting for bounded
> variables using linear combination method. I want to plot the
> cumulative distribution function for the estimated kernel densities. On
> one of the statlist threads
> (, the
> following method has been suggested to plot them:
> >>
> >> sysuse auto
> >> _kdens mpg, g(b a)
> >> cumul b, g(cb)
> >> line cb b, sort
> >>
> >> With the above command, I get the densities on the x-axis, rather
> than the [x]. I looked all over the web to check if I can find how to
> do it, but I have not been successful. If I use the following command:
> >>
> >> line cb a, sort
> >>
> >> I get weird triangle shaped graph.
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index