Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: n() option for -akdensity-

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: n() option for -akdensity-
Date	Tue, 1 Apr 2014 16:44:38 +0100

I could comment in detail here but let me focus on the top and bottom
line, which was an interest in estimating a cumulative distribution
function.

That being so, I think you might consider starting from quantile
estimation and inverting the quantile function. See e.g. -hdquantile-
(SSC).

The road here is pocked by pitfalls, most notably not being able to
extract information that's not in a small sample.

Nick
[email protected]


On 1 April 2014 16:30, Katie Farrin <[email protected]> wrote:
> Thanks, Alfonso.  Yes, that makes sense in terms of the kernel density
> estimation that the max n is the number of observations.  I was just
> hoping there was some trick to get a more continuous CDF from all of
> it.  So far I haven't figured it out.
>
> Katie
>
> On Tue, Apr 1, 2014 at 11:22 AM, Alfonso Sánchez-Peñalver
> <[email protected]> wrote:
>> From what I remember from nonparametrics kernel density programming, you used the sample data points to place a kernel around it. For more or less smoothing, you set the bandwidth. From the help on -kdensity- (since -n- is really a -kdensity- option):
>>
>>  n(#) specifies the number of points at which the density estimate is to be evaluated.  The default is min(N,50), where N is the number of observations in memory.
>>
>> To check whether I was correct or not I did the following:
>>
>> sysuse auto, clear
>> kdensity price, n(100)
>> akdensity price, n(100)
>>
>> Both kdensity and akdensity set the number of observations automatically to 74 (the maximum available). They don't throw an error, so I ran
>>
>> akdensity price, bwidth(.5) normal n(500) generate(gx_chim fgx_chim)
>>
>> to see if it was something in your code, and it doesn't throw an error either. It sets the observations to 74 once again.
>>
>> Best,
>>
>> Alfonso Sánchez-Peñalver, PhD
>>
>>
>> On Apr 1, 2014, at 11:03 AM, Katie Farrin <[email protected]> wrote:
>>
>>> I could be wrong, but it was my understanding that the n() option
>>> allowed for some sort of interpolation to get a smoother density
>>> function from a finite number of data points.  If this isn't possible
>>> I can stick with what I have, but I am trying to find a cutoff point
>>> for a one-in-ten event and don't have that level of precision using
>>> the data I have.
>>>
>>> Thanks for your response.
>>>
>>> Katie
>>>
>>> On Tue, Apr 1, 2014 at 10:54 AM, Alfonso Sánchez-Peñalver
>>> <[email protected]> wrote:
>>>> Forget me if Im wrong because I dont know much about how the adaptive version of a kernel density works (by the way, you should mention that -akdensity- is available from SSC), but the whole point of a kernel density is to use the sample points as kernels and non-parametrically estimate the density function under the assumption that the sample follows the populations distribution. How are you going to use more points than those you have in the data? Are you going to make them up? I may have misunderstood what you meant with larger n, but Im really confused.
>>>>
>>>> Alfonso Sánchez-Peñalver, PhD
>>>>
>>>>
>>>> On Apr 1, 2014, at 10:33 AM, Katie Farrin <[email protected]> wrote:
>>>>
>>>>> Good Morning,
>>>>>
>>>>> I'm trying to estimate a cdf from a kernel density using a data series
>>>>> (n=72) I have using the -akdensity- command.  However, I'd like more
>>>>> data points in the cdf than I have for actual observations and am
>>>>> having trouble using the n() option for -akdensity- and am hoping
>>>>> someone can give me some advice on how to create a cdf with a larger
>>>>> n.
>>>>>
>>>>> I'm plotting the kernel density along with a normal distribution with
>>>>> n(500) and would like the same number of data points for the kernel
>>>>> and cdf, but I get an error message when I try to specify n in the
>>>>> options for the cdf.
>>>>>
>>>>> Here is the code I'm using:
>>>>>
>>>>> akdensity GC, bwidth(.5) normal n(500) generate(gx_chim fgx_chim)
>>>>> cdf(cdf_g_chim)
>>>>> line cdf_g_chim gx_chim
>>>>>
>>>>> Any help would be greatly appreciated.
>>>>>
>>>>> Katie
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: n() option for -akdensity-
  - From: philippe van kerm <[email protected]>

References:
- st: n() option for -akdensity-
  - From: Katie Farrin <[email protected]>
- Re: st: n() option for -akdensity-
  - From: Alfonso Sánchez-Peñalver <[email protected]>
- Re: st: n() option for -akdensity-
  - From: Katie Farrin <[email protected]>
- Re: st: n() option for -akdensity-
  - From: Alfonso Sánchez-Peñalver <[email protected]>
- Re: st: n() option for -akdensity-
  - From: Katie Farrin <[email protected]>

Prev by Date: Re: st: n() option for -akdensity-
Next by Date: st: Panel data correlation
Previous by thread: Re: st: n() option for -akdensity-
Next by thread: RE: st: n() option for -akdensity-
Index(es):
- Date
- Thread