Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: median of consecutive groups - avoiding loops


From   daniel klein <klein.daniel.81@googlemail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: median of consecutive groups - avoiding loops
Date   Thu, 12 May 2011 02:23:29 +0200

This question is indeed interesting. Ad hoc simulation shows, that the
answer seems to depend on the number of groups. While the loop
performs well, if the number of groups is small (10), it slows
considerably down if number of groups increase (100). The speed of the
"egen" solution does not seem to depend on number of groups (all runs
with N=10,000). Guess Stata did a good job writing the -by- prefix.
Simulations have equal group sizes. Overall it seems "egen solution"
outperforms the loop.

Would be interesting if one could speed things up using Mata (as I
would expect). But then again, I guess in "real life" the differences
will not matter much.

Here's the simulation (syntax is -ahsim obs number_of_groups-).

cap prog drop ahsim

prog ahsim
	args obs ngroups
	if "`obs'" == "" loc obs 10000
	if "`ngroups'" == "" loc ngroups 10
	clear all
	qui {
		set obs `ngroups'
		g group = _n
		expand `obs'/`ngroups'
		sort group
		g value = rnormal()
	}
	di _n "{txt}Groups: `groups'"
	di "{txt}Obs." _N

	timer clear
	
	timer on 1
	su group, meanonly
	local last = r(max) - 1

    qui gen mymedian = .

    qui forval i = 1/`last' {
		local j = `i' + 1
        su value if inlist(group, `i', `j') , detail
        replace mymedian = r(p50) if group == `i'
	}
	timer off 1

	timer on 2
	g int newgroup1 = cond(mod(group, 2), group, group-1)
	g int newgroup2 = cond(mod(group, 2), group-1, group)
    bys newgroup1 : egen med1 = median(value)
    bys newgroup2 : egen med2 = median(value)
	g median = cond(mod(group, 2), med1, med2)
    drop newgroup1 newgroup2 med1 med2
    timer off 2	

    timer list
	
	di _n "{txt}1: loop"
	di "{txt}2: egen"
end



Best
Daniel
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index