[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Eva Poen" <eva.poen@unisg.ch> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: Re: Kernel density estimation in a large dataset |

Date |
Tue, 16 Nov 2004 19:17:02 +0100 |

Thanks a lot for this suggestion. I am not sure whether I need equally spaced intervals for the density estimate (this seems to be standard). I ended up doing kdensity x, n(1000) gen(grid dens) sort grid gen density =. forvalues i = 2/19426 { qui count if grid < x[`i'] qui replace density = (dens[r(N)] + dens[r(N)+1])/2 in `i' } While this approach works, it turns out that it takes nearly as long as computing the densitiy for all observations in the first place. In the meantime, I tried this in EViews (with exactly the same data, bandwidth and N) and found that density estimation and interpolation take about 3 seconds (!) in EViews, while Stata has about 10 Minutes overall. I was very surprised by this huge difference in speed. Thanks again and best wishes, Eva Nichols, Austin wrote: > You could > . sort x > . gen y=x if mod(_n,20)==0 | _n==1 | _n==_N > . kdensity x, at(y) gen(xdens) > . ipolate xdens x, gen(f) > > for example. > > -----Original Message----- > From: Eva Poen [mailto:eva.poen@unisg.ch] > Sent: Tuesday, November 16, 2004 10:55 AM > To: statalist@hsphsun2.harvard.edu > Subject: st: Kernel density estimation in a large dataset > > > Dear all, > > I want to do Kernel density estimation and local polynomial regression > on a dataset with 20'000 observations using Stata 8.2. Computations > using all > observations as a grid, like in > > - kdensity x, at(x) gen(xdens) - > > take quite a long time (between 10 and 15 minutes each). So I would > like to use a grid of, say, 1000 points, but still have density > estimates for all my observations. That is, I want to have a variable > xdens which contains in observation i > > - the exact estimated density if x[i] happens to be a grid point > - the linear interpolation of the two densities estimated at the the > closest grid points to the left and right of x[i] > > for all 20'000 observations. I was told that this is the default > behaviour in EViews, but I have really no clue how to best implement > this in Stata. > > Thanks a lot for any suggestions. > Best regards, > > Eva Poen > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: marginal effects for ordered logit** - Next by Date:
**Re: st: svyregress and single psu** - Previous by thread:
**st: svyregress and single psu** - Next by thread:
**st: RE: Re: Kernel density estimation in a large dataset** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |