Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Kernel density estimation in a large dataset


From   "Eva Poen" <[email protected]>
To   <[email protected]>
Subject   st: Re: Kernel density estimation in a large dataset
Date   Tue, 16 Nov 2004 19:17:02 +0100

Thanks a lot for this suggestion. I am not sure whether I need equally
spaced intervals for the density estimate (this seems to be standard). I
ended up doing

kdensity x, n(1000) gen(grid dens)

sort grid
gen density =.
forvalues i = 2/19426 {
   qui count if grid < x[`i']
   qui replace density = (dens[r(N)] + dens[r(N)+1])/2 in `i'
}

While this approach works, it turns out that it takes nearly as long as
computing the densitiy for all observations in the first place. In the
meantime, I tried this in EViews (with exactly the same data, bandwidth
and N) and found that density estimation and interpolation take about 3
seconds (!) in EViews, while Stata has about 10 Minutes overall. I was
very surprised by this huge difference in speed.

Thanks again and best wishes,
Eva




Nichols, Austin wrote:

> You could
> . sort x
> . gen y=x if mod(_n,20)==0 | _n==1 | _n==_N
> . kdensity x, at(y) gen(xdens)
> . ipolate xdens x, gen(f)
>
> for example.
>
> -----Original Message-----
> From: Eva Poen [mailto:[email protected]]
> Sent: Tuesday, November 16, 2004 10:55 AM
> To: [email protected]
> Subject: st: Kernel density estimation in a large dataset
>
>
> Dear all,
>
> I want to do Kernel density estimation and local polynomial regression
> on a dataset with 20'000 observations using Stata 8.2. Computations
> using all
> observations as a grid, like in
>
> - kdensity x, at(x) gen(xdens) -
>
> take quite a long time (between 10 and 15 minutes each). So I would
> like to use a grid of, say, 1000 points, but still have density
> estimates for all my observations. That is, I want to have a variable
> xdens which contains in observation i
>
> - the exact estimated density if x[i] happens to be a grid point
> - the linear interpolation of the two densities estimated at the the
> closest grid points to the left and right of x[i]
>
> for all 20'000 observations. I was told that this is the default
> behaviour in EViews, but I have really no clue how to best implement
> this in Stata.
>
> Thanks a lot for any suggestions.
> Best regards,
>
> Eva Poen
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index