Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: median of consecutive groups - avoiding loops


From   "Sarah Kristina Reuter" <sarah.kristina.reuter@uni-jena.de>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: median of consecutive groups - avoiding loops
Date   Thu, 12 May 2011 20:41:33 +0200

Daniel and Nick
Thank you so much for your discussion!
Daniel: your proposal works perfectly.

One last note for clarification: I wanted to avoid the loop because the data is huge. The egen-
command is much faster no matter what the reason is.

Sarah



Am 12 May 2011 um 8:47 hat Nick Cox geschrieben:

> Interesting. There are lots of side issues here. Here are a few:
> 
> 1. That the loop solution is not especially fast is not primarily
> because it is a loop, I guess. The code uses -summarize, detail-
> to
> get the median because -summarize- doesn't, but that causes lots
> of
> wasted effort in also calculating things like variance, skewness
> and
> kurtosis which are not of interest here.
> 
> 2. -egen- usually slows things down because many command lines
> that
> often do not do much need to be interpreted. But -egen-'s
> -median()-
> function is a smart one and gets the median value directly from
> sorted
> data and Stata is very fast at sorting.
> 
> The problem sounds so odd that I am not tempted to work much more at
> it!
> 
> Nick
> 
> On Thu, May 12, 2011 at 1:23 AM, daniel klein
> <klein.daniel.81@googlemail.com> wrote:
> > This question is indeed interesting. Ad hoc simulation shows, that
> the
> > answer seems to depend on the number of groups. While the loop
> > performs well, if the number of groups is small (10), it slows
> > considerably down if number of groups increase (100). The speed of
> the
> > "egen" solution does not seem to depend on number of groups (all
> runs
> > with N=10,000). Guess Stata did a good job writing the -by-
> prefix.
> > Simulations have equal group sizes. Overall it seems "egen
> solution"
> > outperforms the loop.
> >
> > Would be interesting if one could speed things up using Mata (as
> I
> > would expect). But then again, I guess in "real life" the
> differences
> > will not matter much.
> >
> > Here's the simulation (syntax is -ahsim obs number_of_groups-).
> >
> > cap prog drop ahsim
> >
> > prog ahsim
> >        args obs ngroups
> >        if "`obs'" == "" loc obs 10000
> >        if "`ngroups'" == "" loc ngroups 10
> >        clear all
> >        qui {
> >                set obs `ngroups'
> >                g group = _n
> >                expand `obs'/`ngroups'
> >                sort group
> >                g value = rnormal()
> >        }
> >        di _n "{txt}Groups: `groups'"
> >        di "{txt}Obs." _N
> >
> >        timer clear
> >
> >        timer on 1
> >        su group, meanonly
> >        local last = r(max) - 1
> >
> >    qui gen mymedian = .
> >
> >    qui forval i = 1/`last' {
> >                local j = `i' + 1
> >        su value if inlist(group, `i', `j') , detail
> >        replace mymedian = r(p50) if group == `i'
> >        }
> >        timer off 1
> >
> >        timer on 2
> >        g int newgroup1 = cond(mod(group, 2), group, group-1)
> >        g int newgroup2 = cond(mod(group, 2), group-1, group)
> >    bys newgroup1 : egen med1 = median(value)
> >    bys newgroup2 : egen med2 = median(value)
> >        g median = cond(mod(group, 2), med1, med2)
> >    drop newgroup1 newgroup2 med1 med2
> >    timer off 2
> >
> >    timer list
> >
> >        di _n "{txt}1: loop"
> >        di "{txt}2: egen"
> > end
> >
> >
> >
> >
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


-- 
Dipl.-Kffr. Sarah Reuter

Friedrich-Schiller-Universität Jena
Wirtschaftswissenschaftliche Fakultät
Lehrstuhl für Allgemeine Betriebswirtschaftslehre,
insbesondere Finanzierung, Banken und Risikomanagement
Carl-Zeiss-Str. 3
07743 Jena
Tel.: +49 (0)3641 9 43123
Fax: +49 (0)3641 9 43122



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index