Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Robert Picard <picard@netbox.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Equivalent of Excel's COUNTIF |

Date |
Wed, 16 Nov 2011 15:15:49 -0500 |

You can avoid the -reshape- overhead by duplicating each patient record directly using -expand-. Also, there is no need to do another -reshape- to get back to the original data form, you can simply drop the extra record. Here's a reworked example using Stata datetime internal format: *----------- begin example ------------- clear all set seed 10101 * the number of milliseconds in a day scalar mday = 24 * 60 * 60 * 1000 * create patient records set obs 10000 gen double timein = clock("2011 1 1","YMD") + runiform()*180*mday gen double timeout = timein + runiform()*10*mday format timein timeout %tc gen id = _n * duplicate observations and order by event time expand 2 sort id by id: gen double etime = cond(_n==1,timein,timeout) by id: gen event = cond(_n==1,1,-1) sort etime id * the population goes up or down gen pop = sum(event) - 1 * drop the extra record drop if event == -1 drop etime event *------------ end example -------------- On Wed, Nov 16, 2011 at 2:30 PM, Richard Herron <richard.c.herron@gmail.com> wrote: > The -reshape- solution is faster than the loop solution (timer 1 vs > timer 2 below). With 1e5 individuals the loop solution was beyond my > patience. > > timer list > 1: 0.12 / 1 = 0.1240 > 2: 13.42 / 1 = 13.4210 > > I had to modify our solutions a little. > > * begin code > timer clear > timer on 1 > clear > set obs 10000 > set seed 10101 > generate long id = _n > generate datein = runiform()*5000 > generate dateout = datein + runiform()*15 > > reshape long date, i(id) j(inout) string > sort date > tempvar change > generate int `change' = cond(inout == "in", 1, -1) > generate int total = sum(`change') - 1 > timer off 1 > timer list > > timer on 2 > clear > set obs 10000 > set seed 10101 > generate arrival = runiform()*5000 > generate discharge = arrival + runiform()*15 > gen long npatients = . > gen long _num_discharged = . > sort arrival > forvalues k=1/`=_N' { > quietly replace _num_discharged = sum( discharge <= arrival[`k'] ) > quietly replace npatients = (_n-1) - _num_discharged[`k'] in `k' > } > timer off 2 > > timer list > * end code > > On Wed, Nov 16, 2011 at 11:44, Stas Kolenikov <skolenik@gmail.com> wrote: >> On Wed, Nov 16, 2011 at 11:04 AM, Richard Herron >> <richard.c.herron@gmail.com> wrote: >>> Here is an alternative solution with -reshape-, -cond-, and -sum-. >> >> Cute solution! >> >>> The last two functions should be fast at any scale, but I don't have >>> enough experience with Stata to know if -reshape- is faster than a >>> loop. >> >> That's easy to check: set obs 10M instead of 10, and see what will be >> faster (and whether the -reshape- will start breaking down with large >> data sets; it might or it might not). -reshape- appears to be using a >> lot of I/O with explicit -use-, -save- and -merge- in the code; I >> thought this would have been written in C or Mata -- there's a >> reshape() function in Mata. >> >> -- >> Stas Kolenikov, also found at http://stas.kolenikov.name >> Small print: I use this email account for mailing lists only. >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Equivalent of Excel's COUNTIF***From:*"Teubner, David (Health)" <david.teubner@health.sa.gov.au>

**References**:**st: Equivalent of Excel's COUNTIF***From:*"Teubner, David (Health)" <david.teubner@health.sa.gov.au>

**Re: st: Equivalent of Excel's COUNTIF***From:*Stas Kolenikov <skolenik@gmail.com>

**Re: st: Equivalent of Excel's COUNTIF***From:*Richard Herron <richard.c.herron@gmail.com>

**Re: st: Equivalent of Excel's COUNTIF***From:*Stas Kolenikov <skolenik@gmail.com>

**Re: st: Equivalent of Excel's COUNTIF***From:*Richard Herron <richard.c.herron@gmail.com>

- Prev by Date:
**Re: st: regular expression or some simpler data extraction method** - Next by Date:
**Re: st: regular expression or some simpler data extraction method** - Previous by thread:
**Re: st: Equivalent of Excel's COUNTIF** - Next by thread:
**Re: st: Equivalent of Excel's COUNTIF** - Index(es):