Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Number of people present by date and time


From   Simon <scmoore.lists@googlemail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Number of people present by date and time
Date   Sat, 01 Dec 2012 10:02:52 +0000

Dear Nick and Rebecca,

Thanks for your comments and very helpful code. Nick, your code works perfectly, thank you. And thanks for the -gen long obsid = _n- I might have missed that.

Rebecca, this is unscheduled care, the clinic in question never closes and only very rarely has 0 patients present. But you pretty much caught what I am interested in - the effect of patients in the system (the intoxicated in particular) on new arrival wait times.

Simon




On 29/11/2012 19:04, Nick Cox wrote:
Should be easier than I implied. Even if a unique identifier doesn't
exist for each observation, you just create one. For a _big_ dataset,
be careful on variable type.

I am assuming that -arrival- and -depart- are Stata date-times.

gen long obsid = _n
expand 2
bysort obsid : gen inout = cond(_n == 1, 1, -1)
by obsid : gen double time = cond(_n == 1, arrival, depart)
sort time
gen present = sum(inout)

Two simple checks on logic and data quality

1. The number in the clinic should never be negative.

2. The number in the clinic should be zero when the clinic is closed.

Nick

On Thu, Nov 29, 2012 at 2:01 PM, Nick Cox<njcoxstata@gmail.com>  wrote:
Each observation is, I gather, a patient. One technique is to make
each observation an arrival or departure. For a very simple toy
dataset with just times for one day:

. l

      +-----------------------+
      | arrival   depart   id |
      |-----------------------|
   1. |    1000     1100    1 |
   2. |    1030     1200    2 |
   3. |    1230     1300    3 |
      +-----------------------+

. expand 2
(3 observations created)

. bysort id : gen inout = cond(_n == 1, 1, -1)

. by id : gen time = cond(_n == 1, arrival, depart)

. sort time

. l

      +--------------------------------------+
      | arrival   depart   id   inout   time |
      |--------------------------------------|
   1. |    1000     1100    1       1   1000 |
   2. |    1030     1200    2       1   1030 |
   3. |    1000     1100    1      -1   1100 |
   4. |    1030     1200    2      -1   1200 |
   5. |    1230     1300    3       1   1230 |
      |--------------------------------------|
   6. |    1230     1300    3      -1   1300 |
      +--------------------------------------+

. gen present = sum(inout)

. l, sep(0)

      +------------------------------------------------+
      | arrival   depart   id   inout   time   present |
      |------------------------------------------------|
   1. |    1000     1100    1       1   1000         1 |
   2. |    1030     1200    2       1   1030         2 |
   3. |    1000     1100    1      -1   1100         1 |
   4. |    1030     1200    2      -1   1200         0 |
   5. |    1230     1300    3       1   1230         1 |
   6. |    1230     1300    3      -1   1300         0 |
      +------------------------------------------------+

This is only one trick, and others will depend on your data. For
example, if your clinic is only open daily, you may be able to, or
need to, exploit that. If patients can come to a clinic more than once
a day  that will provide a complication.

All told, you should not need loops here. The two keys are likely to
be (1) the best data structure (2) heavy use of -by:-.

Nick

On Thu, Nov 29, 2012 at 1:35 PM, Simon<scmoore.lists@googlemail.com>  wrote:

This is quite possible a rather naive question, but for some reason I am
stuck.

I have data from a clinic. I have the time each patient checks in
(arrdatetime), the time they leave (depdatetime) and the time taken to
first consultation (waittime) in minutes. What I would like to do is
compare the number of people in the clinic for each patient at
arrdatetime with waittime.

So far the best I can come up with is to write a loop, going through
every patients' arrdatetime and counting up those whose arrival and
departure times span this value. But I have rather a lot of data and
this seems terribly inefficient.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index