Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: st: AW: Create a flag variable for 10 most frequent values


From   "Martin Weiss" <martin.weiss1@gmx.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   AW: st: AW: Create a flag variable for 10 most frequent values
Date   Tue, 17 Nov 2009 09:24:02 +0100

<> 

Bottom line so far: It takes a bit of ingenuity to do what Elan initially
requested, unless somebody comes along with a neat solution...



HTH
Martin


-----Ursprüngliche Nachricht-----
Von: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Sergiy Radyakin
Gesendet: Dienstag, 17. November 2009 01:50
An: statalist@hsphsun2.harvard.edu
Betreff: Re: st: AW: Create a flag variable for 10 most frequent values

On Mon, Nov 16, 2009 at 6:56 PM, Nick Winter <nwinter@virginia.edu> wrote:
> No collapsing, no merging, no -egen-:
>
> sysuse auto
> bysort mpg: gen top10=(_n==1)
> replace top10 = sum(top10)
> sum top10, meanonly
> replace top10 = (top10>=(`r(max)'-9))
>

and not a solution to the problem posed.

Rather it is a solution to a different problem. It assumes that the
dataset is collapsed already, and it gives you the "highest" values,
not the most frequent ones. For the auto.dta (and 3 top, instead of 10
top) your program produces:

.....

 66. | Honda Civic          28       0 |
 67. | Chev. Chevette       29       0 |
 68. | Dodge Colt           30       0 |
 69. | Mazda GLC            30       0 |
 70. | Toyota Corolla       31       0 |
 71. | Plym. Champ          34       1 |
 72. | Datsun 210           35       1 |
 73. | Subaru               35       1 |
 74. | VW Diesel            41       1 |
     +---------------------------------+

clearly 34, 35 and 41 are the _highest_ values of the mpg, but not the
most _frequent_ ones.


Sergiy




>
> On 11/16/2009 6:37 PM, Martin Weiss wrote:
>>
>> <>
>>
>> Good point! I always make up my own dataset according to the description
>> in
>> the initial post, and in this case, my dataset may have been too simple.
>> Still, Elan can -merge- back with the original dataset, with "diagnosis"
>> as
>> her key.
>>
>> ***
>> sysuse auto, clear
>> keep mpg
>>
>> bys mpg: egen mycount=count(mpg)
>>
>> //collapse to one per group
>> bys mpg: keep if _n==1
>> //-sort- on count var
>> sort mycount
>> //take the last ten
>> gen byte mostfreq=inrange(_n,`=_N-9',_N)
>> //and back as we were
>> expand mycount
>>
>> merge m:m mpg /*  */  using "C:\Program Files (x86)\Stata11\auto.dta",
 /*
>>  */ nogenerate nolabel nonotes
>> ***
>>
>>
>> You need to substitute the path to your auto dataset in the last line...
>>
>> HTH
>> Martin
>>
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu
>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Sergiy
Radyakin
>> Sent: Dienstag, 17. November 2009 00:03
>> To: statalist@hsphsun2.harvard.edu
>> Subject: Re: st: AW: Create a flag variable for 10 most frequent values
>>
>> suppose you have data with two vars: name and diagnosis (or make and mpg)
>>
>> and you want to add "top10" dummy to that.
>> You keep one person for each diagnosis
>> After you -expand- there will be N persons with the same name?
>> Can you show this with auto.dta?
>> S.R.
>>
>>
>>
>>
>> On Mon, Nov 16, 2009 at 5:36 PM, Martin Weiss <martin.weiss1@gmx.de>
>> wrote:
>>>
>>> <>
>>>
>>> What do you want to know? I collapse (fineprint: no hyphens around it as
>>> I
>>> use -keep- to do it) the thing to be able to -sort- on "mycount" and
>>
>> assign
>>>
>>> the flag that Elan requested. Once that is done, I want my original data
>>> back, so I -expand- it back to its former glory. Any suggestions for
>>> improvements are welcome...
>>>
>>>
>>>
>>> HTH
>>> Martin
>>>
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: owner-statalist@hsphsun2.harvard.edu
>>> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Sergiy
>>
>> Radyakin
>>>
>>> Gesendet: Montag, 16. November 2009 23:33
>>> An: statalist@hsphsun2.harvard.edu
>>> Betreff: Re: st: AW: Create a flag variable for 10 most frequent values
>>>
>>> Martin, could you please explain how -expand- is used here?
>>> Best, Sergiy
>>>
>>> On Mon, Nov 16, 2009 at 5:14 PM, Martin Weiss <martin.weiss1@gmx.de>
>>
>> wrote:
>>>>
>>>> <>
>>>>
>>>> Here is a strategy:
>>>>
>>>>
>>>> *************
>>>> clear*
>>>>
>>>> //construct data
>>>> set obs 10000
>>>> gen dx=1+int(100*runiform())
>>>>
>>>> //see freqs
>>>> ta dx
>>>> //use ben jann`s -fre-
>>>> capture which fre
>>>> if _rc ssc install fre
>>>> fre dx, desc
>>>>
>>>> //get counts next to "dx"s
>>>> bys dx: egen mycount=count(dx)
>>>>
>>>> //collapse to one per group
>>>> bys dx: keep if _n==1
>>>> //-sort- on count var
>>>> sort mycount
>>>> //take the last ten
>>>> gen byte mostfreq=inrange(_n,`=_N-9',_N)
>>>> //and back as we were
>>>> expand mycount
>>>>
>>>> //see result
>>>> ta myc mostfreq
>>>> *************
>>>>
>>>>
>>>>
>>>> HTH
>>>> Martin
>>>>
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: owner-statalist@hsphsun2.harvard.edu
>>>> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Cohen,
Elan
>>>> Gesendet: Montag, 16. November 2009 22:25
>>>> An: 'statalist@hsphsun2.harvard.edu'
>>>> Betreff: st: Create a flag variable for 10 most frequent values
>>>>
>>>> Hi all,
>>>>
>>>> I have a string variable dx that represents a patient's diagnosis
(about
>>>> 5,000 unique values).  I'd like to create a "top 10 flag" that equals 1
>>
>> if
>>>>
>>>> dx is one of the top 10 most frequent diagnoses and 0 otherwise.
>>>>
>>>> I'm not even sure where to begin.  If someone could point me in the
>>>> right
>>>> direction, I'd be grateful.  Stata 10, Windows XP
>>>>
>>>> Thank you,
>>>>
>>>> - Elan
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> --
> --------------------------------------------------------------
> Nicholas Winter                                 434.924.6994 t
> Assistant Professor                             434.924.3359 f
> Department of Politics                  nwinter@virginia.edu e
> University of Virginia          faculty.virginia.edu/nwinter w
> PO Box 400787, 100 Cabell Hall
> Charlottesville, VA 22904
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index