Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: median of consecutive groups - avoiding loops |

Date |
Thu, 12 May 2011 08:47:46 +0100 |

Interesting. There are lots of side issues here. Here are a few: 1. That the loop solution is not especially fast is not primarily because it is a loop, I guess. The code uses -summarize, detail- to get the median because -summarize- doesn't, but that causes lots of wasted effort in also calculating things like variance, skewness and kurtosis which are not of interest here. 2. -egen- usually slows things down because many command lines that often do not do much need to be interpreted. But -egen-'s -median()- function is a smart one and gets the median value directly from sorted data and Stata is very fast at sorting. The problem sounds so odd that I am not tempted to work much more at it! Nick On Thu, May 12, 2011 at 1:23 AM, daniel klein <klein.daniel.81@googlemail.com> wrote: > This question is indeed interesting. Ad hoc simulation shows, that the > answer seems to depend on the number of groups. While the loop > performs well, if the number of groups is small (10), it slows > considerably down if number of groups increase (100). The speed of the > "egen" solution does not seem to depend on number of groups (all runs > with N=10,000). Guess Stata did a good job writing the -by- prefix. > Simulations have equal group sizes. Overall it seems "egen solution" > outperforms the loop. > > Would be interesting if one could speed things up using Mata (as I > would expect). But then again, I guess in "real life" the differences > will not matter much. > > Here's the simulation (syntax is -ahsim obs number_of_groups-). > > cap prog drop ahsim > > prog ahsim > args obs ngroups > if "`obs'" == "" loc obs 10000 > if "`ngroups'" == "" loc ngroups 10 > clear all > qui { > set obs `ngroups' > g group = _n > expand `obs'/`ngroups' > sort group > g value = rnormal() > } > di _n "{txt}Groups: `groups'" > di "{txt}Obs." _N > > timer clear > > timer on 1 > su group, meanonly > local last = r(max) - 1 > > qui gen mymedian = . > > qui forval i = 1/`last' { > local j = `i' + 1 > su value if inlist(group, `i', `j') , detail > replace mymedian = r(p50) if group == `i' > } > timer off 1 > > timer on 2 > g int newgroup1 = cond(mod(group, 2), group, group-1) > g int newgroup2 = cond(mod(group, 2), group-1, group) > bys newgroup1 : egen med1 = median(value) > bys newgroup2 : egen med2 = median(value) > g median = cond(mod(group, 2), med1, med2) > drop newgroup1 newgroup2 med1 med2 > timer off 2 > > timer list > > di _n "{txt}1: loop" > di "{txt}2: egen" > end > > > > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: median of consecutive groups - avoiding loops***From:*"Sarah Kristina Reuter" <sarah.kristina.reuter@uni-jena.de>

**References**:**Re: st: median of consecutive groups - avoiding loops***From:*daniel klein <klein.daniel.81@googlemail.com>

- Prev by Date:
**Re: st: Wanted: simulated data with cluster structure** - Next by Date:
**Re: st: Combining value labels, appending** - Previous by thread:
**Re: st: median of consecutive groups - avoiding loops** - Next by thread:
**Re: st: median of consecutive groups - avoiding loops** - Index(es):