From
daniel klein <klein.daniel.81@googlemail.com>

To
statalist@hsphsun2.harvard.edu

Subject
Re: st: median of consecutive groups - avoiding loops

Date
Thu, 12 May 2011 02:23:29 +0200

This question is indeed interesting. Ad hoc simulation shows, that the answer seems to depend on the number of groups. While the loop performs well, if the number of groups is small (10), it slows considerably down if number of groups increase (100). The speed of the "egen" solution does not seem to depend on number of groups (all runs with N=10,000). Guess Stata did a good job writing the -by- prefix. Simulations have equal group sizes. Overall it seems "egen solution" outperforms the loop. Would be interesting if one could speed things up using Mata (as I would expect). But then again, I guess in "real life" the differences will not matter much. Here's the simulation (syntax is -ahsim obs number_of_groups-). cap prog drop ahsim prog ahsim args obs ngroups if "`obs'" == "" loc obs 10000 if "`ngroups'" == "" loc ngroups 10 clear all qui { set obs `ngroups' g group = _n expand `obs'/`ngroups' sort group g value = rnormal() } di _n "{txt}Groups: `groups'" di "{txt}Obs." _N timer clear timer on 1 su group, meanonly local last = r(max) - 1 qui gen mymedian = . qui forval i = 1/`last' { local j = `i' + 1 su value if inlist(group, `i', `j') , detail replace mymedian = r(p50) if group == `i' } timer off 1 timer on 2 g int newgroup1 = cond(mod(group, 2), group, group-1) g int newgroup2 = cond(mod(group, 2), group-1, group) bys newgroup1 : egen med1 = median(value) bys newgroup2 : egen med2 = median(value) g median = cond(mod(group, 2), med1, med2) drop newgroup1 newgroup2 med1 med2 timer off 2 timer list di _n "{txt}1: loop" di "{txt}2: egen" end Best Daniel * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

