Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: differentiating between groups of records with same date


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: differentiating between groups of records with same date
Date   Tue, 31 Jul 2012 09:30:10 -0500

See

FAQ     . . . . . . . . . . . . . . . . . . .  Number of distinct observations
        . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and G. Longton
        10/08   How do I compute the number of distinct observations?
                http://www.stata.com/support/faqs/data-management/
                number-of-distinct-observations/


SJ-12-2 dm0042_1  . . . . . . . . . . . . . . . . Software update for distinct
        (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
        Q2/12   SJ 12(2):352
        options added to restrict output to variables with a minimum
        or maximum of distinct values

SJ-8-4  dm0042  . . . . . . . . . . . .  Speaking Stata: Distinct observations
        (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
        Q4/08   SJ 8(4):557--568
        shows how to answer questions about distinct observations
        from first principles; provides a convenience command

On Tue, Jul 31, 2012 at 8:27 AM, Tim Evans <Tim.Evans@wmciu.nhs.uk> wrote:
> Nick,
>
> Apologies for the lack of clarity.
>
> For the following dataset (below) I wish to count the distinct number of proc_type for each patient on a given surgery_date.
>
> patient_no      cancer_no       diag_date       surgery_date      proc_type
>> 9512834         0484360     21may1994       21may1994                 H1
>> 9512834         0484358    21may1994       21may1994                  H2
>> 9512834         0483234    26apr2000       21may2000                  H1
>> 9512834         0483233    26apr2000
>> 0000012         0000012          21Jan1999        21Jan1999           H3
>> 0000012         0000013          21Jan1999        21Jan1999           H3
>> 0000012         0000014          21Jan1999        21Jan1999           H3
>
>
> In my snapshot above, patient_no 000012 has 3 cancers, with a surgery_date of 21Jan1999, but only one proc_type - so my count should be 1. In contrast, patient_number 9512834 has 2 cancers with a surgery_date of 21may1994, and has 2 proc_types on 21may1994 - my count should therefore be 2.
>
> Or put another way, for each surgery date, how many unique proc_types did each patient have.
>
> Hope this is clearer.
>
> Best wishes
>
> Tim
>
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
> Sent: 31 July 2012 14:02
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: differentiating between groups of records with same date
>
> Sorry, but what's your question?
>
> On 31 Jul 2012, at 13:32, Tim Evans <Tim.Evans@wmciu.nhs.uk> wrote:
>
>> Hi Nick,
>>
>> I've been taking a look at the reference you pointed me to, and been
>> experimenting, to see how I would count for each patient, the number
>> of different procedures that took place on the same date.
>>
>> Again I have
>>
>> patient_no      cancer_no       diag_date       surgery_date
>> proc_type
>> 9512834         0484360     21may1994       21may1994    H1
>> 9512834         0484358    21may1994       21may1994    H2
>> 9512834         0483234    26apr2000       21may2000    H1
>> 9512834         0483233    26apr2000
>> 0000012         0000012          21Jan1999        21Jan1999    H3
>> 0000012         0000013          21Jan1999        21Jan1999    H3
>> 0000012         0000014          21Jan1999        21Jan1999    H3
>>
>> So I want to say that patient 9512834 had 2 different proc_types on
>> 21may1994 and that patient 0000012 had one operation.
>>
>> Best wishes
>>
>> Tim
>>
>>
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
>> statalist@hsphsun2.harvard.edu] On Behalf Of Tim Evans
>> Sent: 31 July 2012 10:53
>> To: 'statalist@hsphsun2.harvard.edu'
>> Subject: RE: st: differentiating between groups of records with same
>> date
>>
>> Nick,
>>
>> Thanks for this, a handy piece of code/functionality.
>>
>> Best wishes
>> Tim
>>
>>
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
>> statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
>> Sent: 30 July 2012 17:50
>> To: statalist@hsphsun2.harvard.edu
>> Subject: Re: st: differentiating between groups of records with same
>> date
>>
>> bysort patient_no diag_date: gen freq = _N
>>
>> See also
>>
>> SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to move
>> step by: step
>>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>> N. J. Cox
>>        Q1/02   SJ 2(1):86--102                                  (no
>> commands)
>>        explains the use of the by varlist : construct to tackle
>>        a variety of problems with group structure, ranging from
>>        simple calculations for each of several groups to more
>>        advanced manipulations that use the built-in _n and _N
>>
>>
>> Nick
>>
>> On Mon, Jul 30, 2012 at 10:20 AM, Tim Evans <Tim.Evans@wmciu.nhs.uk>
>> wrote:
>>> Hi all,
>>>
>>> I have a group of patients who are in a dataset of cancers. Each
>>> patient may have more than one cancer diagnosed, and so may be
>>> present in my dataset a number of times. Each patient has a unique
>>> patient identifier, and each cancer has a unique cancer identifier.
>>> Each row of data is cancer specific, but does contain the patient
>>> identifier. It is possible that a patient has 2 cancers diagnosed
>>> on the same day in my dataset. What I would like to do is generate
>>> a flag next to each record to show against each cancer the number
>>> of cancers diagnosed on the same day.
>>>
>>> My data are like this:
>>>
>>> patient_no              cancer_no       diag_date       surgery_date
>>> 9512834         0484360 21may1994       21may1994
>>> 9512834         0484358 21may1994       21may1994
>>> 9512834         0483234 26apr2000       21may2000
>>> 9512834         0483233 26apr2000
>>> 0000057         0000057 19jul2009       19jul2009
>>> 0000060         0000060 02nov2009       24nov2009
>>> 0000074         0000074 21sep2009       22nov2009
>>>
>>>
>>> For example, patient 9512834 had 2 cancers diagnosed on 21may1994
>>> and so for cancer_no 0484360 and 0484358, I would like to generate
>>> a new variable with the value 2 against each record. Similiarly
>>> patient 0000057 has only one cancer diagnosed, and so the new
>>> variable would contain 1.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index