Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: n() option for -akdensity-

From	philippe van kerm <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	RE: st: n() option for -akdensity-
Date	Wed, 2 Apr 2014 09:28:29 +0000

-akdensity- (and -kdensity-) refuses to set the number of evaluation points n() beyond _N as it would otherwise have to alter your dataset by adding observations. -akdensity- does not want to do that, but you can take responsibility for it yourself with -set obs ...-

sysuse auto, clear
set obs 500
kdensity price, n(500)
akdensity price, n(500)
akdensity price, bwidth(1500) n(500) generate(x fx) cdf(Fx)
tw line Fx x
tw line fx x

Estimates for fx and Fx at any x are unaffected by the number of evaluation points you select (it is bandwidth choice that matters here). But the apparent smoothness of the curve when you plot and connect your estimates will depend on n (since by default -tw line- connects points with linear segments). Try

akdensity price, bwidth(1500) n(10) generate(x10 fx10) 
akdensity price, bwidth(1500) n(50) generate(x50 fx50) 
akdensity price, bwidth(1500) n(500) generate(x500 fx500) 
gen aty10 = 0
gen aty50 = -0.00002
gen aty500 = -0.00004
tw (line fx10 x10) (line fx50 x50) (line fx500 x500) (scatter aty10 x10) (scatter aty50 x50) (scatter aty500 x500)

In this example the plots for n(50) or n(500) are pretty much the same. 

Philippe



> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of Nick Cox
> Sent: Tuesday, April 01, 2014 4:46 PM
> To: [email protected]
> Subject: Re: st: n() option for -akdensity-
> 
> I could comment in detail here but let me focus on the top and bottom
> line, which was an interest in estimating a cumulative distribution
> function.
> 
> That being so, I think you might consider starting from quantile
> estimation and inverting the quantile function. See e.g. -hdquantile-
> (SSC).
> 
> The road here is pocked by pitfalls, most notably not being able to
> extract information that's not in a small sample.
> 
> Nick
> [email protected]
> 
> 
> On 1 April 2014 16:30, Katie Farrin <[email protected]> wrote:
> > Thanks, Alfonso.  Yes, that makes sense in terms of the kernel
> density
> > estimation that the max n is the number of observations.  I was just
> > hoping there was some trick to get a more continuous CDF from all of
> > it.  So far I haven't figured it out.
> >
> > Katie
> >
> > On Tue, Apr 1, 2014 at 11:22 AM, Alfonso Sánchez-Peñalver
> > <[email protected]> wrote:
> >> From what I remember from nonparametrics kernel density programming,
> you used the sample data points to place a kernel around it. For more
> or less smoothing, you set the bandwidth. From the help on -kdensity-
> (since -n- is really a -kdensity- option):
> >>
> >>  n(#) specifies the number of points at which the density estimate
> is to be evaluated.  The default is min(N,50), where N is the number of
> observations in memory.
> >>
> >> To check whether I was correct or not I did the following:
> >>
> >> sysuse auto, clear
> >> kdensity price, n(100)
> >> akdensity price, n(100)
> >>
> >> Both kdensity and akdensity set the number of observations
> automatically to 74 (the maximum available). They don't throw an error,
> so I ran
> >>
> >> akdensity price, bwidth(.5) normal n(500) generate(gx_chim fgx_chim)
> >>
> >> to see if it was something in your code, and it doesn't throw an
> error either. It sets the observations to 74 once again.
> >>
> >> Best,
> >>
> >> Alfonso Sánchez-Peñalver, PhD
> >>
> >>
> >> On Apr 1, 2014, at 11:03 AM, Katie Farrin <[email protected]> wrote:
> >>
> >>> I could be wrong, but it was my understanding that the n() option
> >>> allowed for some sort of interpolation to get a smoother density
> >>> function from a finite number of data points.  If this isn't
> possible
> >>> I can stick with what I have, but I am trying to find a cutoff
> point
> >>> for a one-in-ten event and don't have that level of precision using
> >>> the data I have.
> >>>
> >>> Thanks for your response.
> >>>
> >>> Katie
> >>>
> >>> On Tue, Apr 1, 2014 at 10:54 AM, Alfonso Sánchez-Peñalver
> >>> <[email protected]> wrote:
> >>>> Forget me if Im wrong because I dont know much about how the
> adaptive version of a kernel density works (by the way, you should
> mention that -akdensity- is available from SSC), but the whole point of
> a kernel density is to use the sample points as kernels and non-
> parametrically estimate the density function under the assumption that
> the sample follows the populations distribution. How are you going to
> use more points than those you have in the data? Are you going to make
> them up? I may have misunderstood what you meant with larger n, but Im
> really confused.
> >>>>
> >>>> Alfonso Sánchez-Peñalver, PhD
> >>>>
> >>>>
> >>>> On Apr 1, 2014, at 10:33 AM, Katie Farrin <[email protected]>
> wrote:
> >>>>
> >>>>> Good Morning,
> >>>>>
> >>>>> I'm trying to estimate a cdf from a kernel density using a data
> series
> >>>>> (n=72) I have using the -akdensity- command.  However, I'd like
> more
> >>>>> data points in the cdf than I have for actual observations and am
> >>>>> having trouble using the n() option for -akdensity- and am hoping
> >>>>> someone can give me some advice on how to create a cdf with a
> larger
> >>>>> n.
> >>>>>
> >>>>> I'm plotting the kernel density along with a normal distribution
> with
> >>>>> n(500) and would like the same number of data points for the
> kernel
> >>>>> and cdf, but I get an error message when I try to specify n in
> the
> >>>>> options for the cdf.
> >>>>>
> >>>>> Here is the code I'm using:
> >>>>>
> >>>>> akdensity GC, bwidth(.5) normal n(500) generate(gx_chim fgx_chim)
> >>>>> cdf(cdf_g_chim)
> >>>>> line cdf_g_chim gx_chim
> >>>>>
> >>>>> Any help would be greatly appreciated.
> >>>>>
> >>>>> Katie
> >>>>> *
> >>>>> *   For searches and help try:
> >>>>> *   http://www.stata.com/help.cgi?search
> >>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> >>>>> *   http://www.ats.ucla.edu/stat/stata/
> >>>>
> >>>>
> >>>> *
> >>>> *   For searches and help try:
> >>>> *   http://www.stata.com/help.cgi?search
> >>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> >>>> *   http://www.ats.ucla.edu/stat/stata/
> >>>
> >>> *
> >>> *   For searches and help try:
> >>> *   http://www.stata.com/help.cgi?search
> >>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> >>> *   http://www.ats.ucla.edu/stat/stata/
> >>
> >>
> >> *
> >> *   For searches and help try:
> >> *   http://www.stata.com/help.cgi?search
> >> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> >> *   http://www.ats.ucla.edu/stat/stata/
> >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/faqs/resources/statalist-faq/
> > *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: n() option for -akdensity-
  - From: Katie Farrin <[email protected]>
- Re: st: n() option for -akdensity-
  - From: Alfonso Sánchez-Peñalver <[email protected]>
- Re: st: n() option for -akdensity-
  - From: Katie Farrin <[email protected]>
- Re: st: n() option for -akdensity-
  - From: Alfonso Sánchez-Peñalver <[email protected]>
- Re: st: n() option for -akdensity-
  - From: Katie Farrin <[email protected]>
- Re: st: n() option for -akdensity-
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: RE: return null Mata
Next by Date: st: RE: Maximum Likelihood
Previous by thread: Re: st: n() option for -akdensity-
Next by thread: [no subject]
Index(es):
- Date
- Thread