Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: median of consecutive groups - avoiding loops |
Date | Thu, 12 May 2011 08:47:46 +0100 |
Interesting. There are lots of side issues here. Here are a few: 1. That the loop solution is not especially fast is not primarily because it is a loop, I guess. The code uses -summarize, detail- to get the median because -summarize- doesn't, but that causes lots of wasted effort in also calculating things like variance, skewness and kurtosis which are not of interest here. 2. -egen- usually slows things down because many command lines that often do not do much need to be interpreted. But -egen-'s -median()- function is a smart one and gets the median value directly from sorted data and Stata is very fast at sorting. The problem sounds so odd that I am not tempted to work much more at it! Nick On Thu, May 12, 2011 at 1:23 AM, daniel klein <klein.daniel.81@googlemail.com> wrote: > This question is indeed interesting. Ad hoc simulation shows, that the > answer seems to depend on the number of groups. While the loop > performs well, if the number of groups is small (10), it slows > considerably down if number of groups increase (100). The speed of the > "egen" solution does not seem to depend on number of groups (all runs > with N=10,000). Guess Stata did a good job writing the -by- prefix. > Simulations have equal group sizes. Overall it seems "egen solution" > outperforms the loop. > > Would be interesting if one could speed things up using Mata (as I > would expect). But then again, I guess in "real life" the differences > will not matter much. > > Here's the simulation (syntax is -ahsim obs number_of_groups-). > > cap prog drop ahsim > > prog ahsim > args obs ngroups > if "`obs'" == "" loc obs 10000 > if "`ngroups'" == "" loc ngroups 10 > clear all > qui { > set obs `ngroups' > g group = _n > expand `obs'/`ngroups' > sort group > g value = rnormal() > } > di _n "{txt}Groups: `groups'" > di "{txt}Obs." _N > > timer clear > > timer on 1 > su group, meanonly > local last = r(max) - 1 > > qui gen mymedian = . > > qui forval i = 1/`last' { > local j = `i' + 1 > su value if inlist(group, `i', `j') , detail > replace mymedian = r(p50) if group == `i' > } > timer off 1 > > timer on 2 > g int newgroup1 = cond(mod(group, 2), group, group-1) > g int newgroup2 = cond(mod(group, 2), group-1, group) > bys newgroup1 : egen med1 = median(value) > bys newgroup2 : egen med2 = median(value) > g median = cond(mod(group, 2), med1, med2) > drop newgroup1 newgroup2 med1 med2 > timer off 2 > > timer list > > di _n "{txt}1: loop" > di "{txt}2: egen" > end > > > > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/