Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Equivalent of Excel's COUNTIF


From   Richard Herron <richard.c.herron@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Equivalent of Excel's COUNTIF
Date   Wed, 16 Nov 2011 14:30:39 -0500

The -reshape- solution is faster than the loop solution (timer 1 vs
timer 2 below). With 1e5 individuals the loop solution was beyond my
patience.

timer list
   1:      0.12 /        1 =       0.1240
   2:     13.42 /        1 =      13.4210

I had to modify our solutions a little.

* begin code
timer clear
timer on 1
clear
set obs 10000
set seed 10101
generate long id = _n
generate datein = runiform()*5000
generate dateout = datein + runiform()*15

reshape long date, i(id) j(inout) string
sort date
tempvar change
generate int `change' = cond(inout == "in", 1, -1)
generate int total = sum(`change') - 1
timer off 1
timer list

timer on 2
clear
set obs 10000
set seed 10101
generate arrival = runiform()*5000
generate discharge = arrival + runiform()*15
gen long npatients = .
gen long _num_discharged = .
sort arrival
forvalues k=1/`=_N' {
 quietly replace _num_discharged = sum( discharge <= arrival[`k'] )
 quietly replace npatients = (_n-1) - _num_discharged[`k'] in `k'
}
timer off 2

timer list
* end code

On Wed, Nov 16, 2011 at 11:44, Stas Kolenikov <skolenik@gmail.com> wrote:
> On Wed, Nov 16, 2011 at 11:04 AM, Richard Herron
> <richard.c.herron@gmail.com> wrote:
>> Here is an alternative solution with -reshape-, -cond-, and -sum-.
>
> Cute solution!
>
>> The last two functions should be fast at any scale, but I don't have
>> enough experience with Stata to know if -reshape- is faster than a
>> loop.
>
> That's easy to check: set obs 10M instead of 10, and see what will be
> faster (and whether the -reshape- will start breaking down with large
> data sets; it might or it might not). -reshape- appears to be using a
> lot of I/O with explicit -use-, -save- and -merge- in the code; I
> thought this would have been written in C or Mata -- there's a
> reshape() function in Mata.
>
> --
> Stas Kolenikov, also found at http://stas.kolenikov.name
> Small print: I use this email account for mailing lists only.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index