Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Equivalent of Excel's COUNTIF

From	"Teubner, David (Health)" <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: Equivalent of Excel's COUNTIF
Date	Thu, 17 Nov 2011 10:59:01 +1030

Thank-you everyone for your help - fantastic!

David Teubner 

On 17/11/2011, at 6:45 AM, Robert Picard wrote:

> You can avoid the -reshape- overhead by duplicating each patient
> record directly using -expand-. Also, there is no need to do another
> -reshape- to get back to the original data form, you can simply drop
> the extra record. Here's a reworked example using Stata datetime
> internal format:
> 
> *----------- begin example -------------
> 
> clear all
> set seed 10101
> 
> * the number of milliseconds in a day
> scalar mday = 24 * 60 * 60 * 1000
> 
> * create patient records
> set obs 10000
> gen double timein = clock("2011 1 1","YMD") + runiform()*180*mday
> gen double timeout = timein + runiform()*10*mday
> format timein timeout %tc
> gen id = _n
> 
> * duplicate observations and order by event time
> expand 2
> sort id
> by id: gen double etime = cond(_n==1,timein,timeout)
> by id: gen event = cond(_n==1,1,-1)
> sort etime id
> 
> * the population goes up or down
> gen pop = sum(event) - 1
> 
> * drop the extra record
> drop if event == -1
> drop etime event
> 
> *------------ end example --------------
> 
> 
> 
> On Wed, Nov 16, 2011 at 2:30 PM, Richard Herron
> <[email protected]> wrote:
>> The -reshape- solution is faster than the loop solution (timer 1 vs
>> timer 2 below). With 1e5 individuals the loop solution was beyond my
>> patience.
>> 
>> timer list
>>   1:      0.12 /        1 =       0.1240
>>   2:     13.42 /        1 =      13.4210
>> 
>> I had to modify our solutions a little.
>> 
>> * begin code
>> timer clear
>> timer on 1
>> clear
>> set obs 10000
>> set seed 10101
>> generate long id = _n
>> generate datein = runiform()*5000
>> generate dateout = datein + runiform()*15
>> 
>> reshape long date, i(id) j(inout) string
>> sort date
>> tempvar change
>> generate int `change' = cond(inout == "in", 1, -1)
>> generate int total = sum(`change') - 1
>> timer off 1
>> timer list
>> 
>> timer on 2
>> clear
>> set obs 10000
>> set seed 10101
>> generate arrival = runiform()*5000
>> generate discharge = arrival + runiform()*15
>> gen long npatients = .
>> gen long _num_discharged = .
>> sort arrival
>> forvalues k=1/`=_N' {
>>  quietly replace _num_discharged = sum( discharge <= arrival[`k'] )
>>  quietly replace npatients = (_n-1) - _num_discharged[`k'] in `k'
>> }
>> timer off 2
>> 
>> timer list
>> * end code
>> 
>> On Wed, Nov 16, 2011 at 11:44, Stas Kolenikov <[email protected]> wrote:
>>> On Wed, Nov 16, 2011 at 11:04 AM, Richard Herron
>>> <[email protected]> wrote:
>>>> Here is an alternative solution with -reshape-, -cond-, and -sum-.
>>> 
>>> Cute solution!
>>> 
>>>> The last two functions should be fast at any scale, but I don't have
>>>> enough experience with Stata to know if -reshape- is faster than a
>>>> loop.
>>> 
>>> That's easy to check: set obs 10M instead of 10, and see what will be
>>> faster (and whether the -reshape- will start breaking down with large
>>> data sets; it might or it might not). -reshape- appears to be using a
>>> lot of I/O with explicit -use-, -save- and -merge- in the code; I
>>> thought this would have been written in C or Mata -- there's a
>>> reshape() function in Mata.
>>> 
>>> --
>>> Stas Kolenikov, also found at http://stas.kolenikov.name
>>> Small print: I use this email account for mailing lists only.
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>> 
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Equivalent of Excel's COUNTIF
  - From: "Teubner, David (Health)" <[email protected]>
- Re: st: Equivalent of Excel's COUNTIF
  - From: Stas Kolenikov <[email protected]>
- Re: st: Equivalent of Excel's COUNTIF
  - From: Richard Herron <[email protected]>
- Re: st: Equivalent of Excel's COUNTIF
  - From: Stas Kolenikov <[email protected]>
- Re: st: Equivalent of Excel's COUNTIF
  - From: Richard Herron <[email protected]>
- Re: st: Equivalent of Excel's COUNTIF
  - From: Robert Picard <[email protected]>

Prev by Date: st: Fracpoly hangs in Mac OS X version of Stata 11
Next by Date: st: quick question : survival curves with pweighted data in Stata 12
Previous by thread: Re: st: Equivalent of Excel's COUNTIF
Next by thread: st: Graphing medians with xtgraph bug
Index(es):
- Date
- Thread