Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: kernel density estimation in a large dataset

From   Kit Baum <[email protected]>
To   [email protected]
Subject   st: kernel density estimation in a large dataset
Date   Tue, 16 Nov 2004 14:02:07 -0500

Agree with Nick's comments on this; but just for the record, it need not take that long if you can get away from the "in i" type syntax, which slows Stata down tremendously. Here is some code that massages 28,534 observations in 28.5 seconds -- not quite 3 seconds, but a factor of 10 for interpreted code (kdensity, ipolate) vs. compiled code (as eViews doubtless is using) is not too shabby. I have seen a factor of 100 in favor of compiled code in other software. A 10:1 speed disadvantage of interpreted code is reasonable.

Hey, and Vince was right about speeding up graphics--generating a graph of the 28,534 points only took 3.2 seconds!

Lest you think these timings are coming from some supercomputer, they're not--they are run on a 1 Ghz laptop, which is, in dogs' years, dead.

Kit Baum, Boston College Economics

set rmsg on
kdensity ln_wage, n(1000) gen(grid dens) nograph
su ln_wage, meanonly
local new =r(N)+1000
set obs `new'
g newwage = grid[_n-r(N)]
g newdens = dens[_n-r(N)]
replace newwage = ln_wage if newwage==.
sort newwage
g n = _n
ipolate newdens n , gen(new2dens)
su newwage new2dens if ln_wage<.
twoway line new2dens newwage if ln_wage<.

* For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index