Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: generating dummy variables based on freq of duplicate values


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: generating dummy variables based on freq of duplicate values
Date   Tue, 20 Aug 2013 10:16:26 +0100

Or

bys patientid: replace highfreq= _N>4

Nick
[email protected]


On 20 August 2013 10:11, Willard van Ooij <[email protected]> wrote:

> I may be missing something, but isn't this solution much easier?
>
> gen highfreq=0
> bys patientid: replace highfreq=1 if _N>4
>
> But this only works if Yerik want just 2 groups, a high and low frequency group.

Eric  A. Booth

> Take a look at -help egen-, particularly the cut() function.  Here's one way to get what you are asking about:
>
> ********************!begin example
> clear
> set obs 500
> g patientid = trunc(runiform()*50)
>
> bys patientid: egen freq = count(patientid) su freq
>
> egen freqcat = cut(freq), at(0 4 10 30) lab ta freqcat, miss
>
> ta freqcat, g(cat_)
>
> su cat_?
> ********************!end example

> On Mon, Aug 19, 2013 at 8:55 PM, Yerik Kaslow <[email protected]> wrote:

>> I am working w a dataset for clinical trials. My data has patient IDs
>> which often repeat; everytime they participate in a trial, they are
>> recorded. I want to group the patient IDs into high frequency and low
>> frequency participants, based on the frequency they are involved with
>> the clinical trials. I am trying to write syntax to create a dummy
>> variable based on frequency of duplicate patientIDs.
>>
>> EG:
>> Patient ID 6523 appears 2 times
>> Patient ID 7634 appears 10 times
>> Patient ID 8798 appears 4 times
>> Patient ID 9032 appears 21 times
>>
>> I would like to write syntax such that any patient ID with a frequency
>> of <= 4 (or any other value I choose) is assigned value of 0...low
>> frequency patient in this case. Likewise, any patient ID with a
>> frequency of >=5 is assigned a value of 1...high frequency patient.
>>
>> How would I write syntax to say, assign a value of 1/0 based on the
>> number of the same patient IDs in the data?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index