Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: median of consecutive groups - avoiding loops


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: median of consecutive groups - avoiding loops
Date   Thu, 12 May 2011 08:47:46 +0100

Interesting. There are lots of side issues here. Here are a few:

1. That the loop solution is not especially fast is not primarily
because it is a loop, I guess. The code uses -summarize, detail- to
get the median because -summarize- doesn't, but that causes lots of
wasted effort in also calculating things like variance, skewness and
kurtosis which are not of interest here.

2. -egen- usually slows things down because many command lines that
often do not do much need to be interpreted. But -egen-'s -median()-
function is a smart one and gets the median value directly from sorted
data and Stata is very fast at sorting.

The problem sounds so odd that I am not tempted to work much more at it!

Nick

On Thu, May 12, 2011 at 1:23 AM, daniel klein
<klein.daniel.81@googlemail.com> wrote:
> This question is indeed interesting. Ad hoc simulation shows, that the
> answer seems to depend on the number of groups. While the loop
> performs well, if the number of groups is small (10), it slows
> considerably down if number of groups increase (100). The speed of the
> "egen" solution does not seem to depend on number of groups (all runs
> with N=10,000). Guess Stata did a good job writing the -by- prefix.
> Simulations have equal group sizes. Overall it seems "egen solution"
> outperforms the loop.
>
> Would be interesting if one could speed things up using Mata (as I
> would expect). But then again, I guess in "real life" the differences
> will not matter much.
>
> Here's the simulation (syntax is -ahsim obs number_of_groups-).
>
> cap prog drop ahsim
>
> prog ahsim
>        args obs ngroups
>        if "`obs'" == "" loc obs 10000
>        if "`ngroups'" == "" loc ngroups 10
>        clear all
>        qui {
>                set obs `ngroups'
>                g group = _n
>                expand `obs'/`ngroups'
>                sort group
>                g value = rnormal()
>        }
>        di _n "{txt}Groups: `groups'"
>        di "{txt}Obs." _N
>
>        timer clear
>
>        timer on 1
>        su group, meanonly
>        local last = r(max) - 1
>
>    qui gen mymedian = .
>
>    qui forval i = 1/`last' {
>                local j = `i' + 1
>        su value if inlist(group, `i', `j') , detail
>        replace mymedian = r(p50) if group == `i'
>        }
>        timer off 1
>
>        timer on 2
>        g int newgroup1 = cond(mod(group, 2), group, group-1)
>        g int newgroup2 = cond(mod(group, 2), group-1, group)
>    bys newgroup1 : egen med1 = median(value)
>    bys newgroup2 : egen med2 = median(value)
>        g median = cond(mod(group, 2), med1, med2)
>    drop newgroup1 newgroup2 med1 med2
>    timer off 2
>
>    timer list
>
>        di _n "{txt}1: loop"
>        di "{txt}2: egen"
> end
>
>
>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index