Ugh. How silly of me.

sysuse auto bysort mpg: gen n=_N bysort n mpg: gen top10=(_n==1) replace top10 = sum(top10) sum top10, meanonly replace top10 = (top10>=(`r(max)'-9))

bysort mpg: gen n=_N bysort n mpg: gen tag=(_n==1) replace tag = sum(tag) sum tag , meanonly gen top10ties = (tag>=(`r(max)'-9)) sum n if tag==(`r(max)'-9), meanonly replace top10ties = 1 if n==`r(max)' table mpg top10 table mpg top10ties On 11/16/2009 7:10 PM, Martin Weiss wrote:

<> Why is "18", which is the most frequent "mpg" value, assigned a "0" for "top10" in your example? Your code seems to flag the highest values (my initial mistake), and not the most frequent ones... HTH Martin -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Winter Sent: Dienstag, 17. November 2009 00:57 To: statalist@hsphsun2.harvard.edu Subject: Re: st: AW: Create a flag variable for 10 most frequent values No collapsing, no merging, no -egen-: sysuse auto bysort mpg: gen top10=(_n==1) replace top10 = sum(top10) sum top10, meanonly replace top10 = (top10>=(`r(max)'-9)) On 11/16/2009 6:37 PM, Martin Weiss wrote:<> Good point! I always make up my own dataset according to the descriptioninthe initial post, and in this case, my dataset may have been too simple. Still, Elan can -merge- back with the original dataset, with "diagnosis"asher key. *** sysuse auto, clear keep mpg bys mpg: egen mycount=count(mpg) //collapse to one per group bys mpg: keep if _n==1 //-sort- on count var sort mycount //take the last ten gen byte mostfreq=inrange(_n,`=_N-9',_N) //and back as we were expand mycountmerge m:m mpg /**/ using "C:\Program Files (x86)\Stata11\auto.dta", /**/ nogenerate nolabel nonotes*** You need to substitute the path to your auto dataset in the last line... HTH Martin -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Sergiy Radyakin Sent: Dienstag, 17. November 2009 00:03 To: statalist@hsphsun2.harvard.edu Subject: Re: st: AW: Create a flag variable for 10 most frequent values suppose you have data with two vars: name and diagnosis (or make and mpg) and you want to add "top10" dummy to that. You keep one person for each diagnosis After you -expand- there will be N persons with the same name? Can you show this with auto.dta? S.R. On Mon, Nov 16, 2009 at 5:36 PM, Martin Weiss <martin.weiss1@gmx.de>wrote:<> What do you want to know? I collapse (fineprint: no hyphens around it asIuse -keep- to do it) the thing to be able to -sort- on "mycount" andassignthe flag that Elan requested. Once that is done, I want my original data back, so I -expand- it back to its former glory. Any suggestions for improvements are welcome... HTH Martin -----Ursprüngliche Nachricht----- Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von SergiyRadyakinGesendet: Montag, 16. November 2009 23:33 An: statalist@hsphsun2.harvard.edu Betreff: Re: st: AW: Create a flag variable for 10 most frequent values Martin, could you please explain how -expand- is used here? Best, Sergiy On Mon, Nov 16, 2009 at 5:14 PM, Martin Weiss <martin.weiss1@gmx.de>wrote:<> Here is a strategy: ************* clear* //construct data set obs 10000 gen dx=1+int(100*runiform()) //see freqs ta dx //use ben jann`s -fre- capture which fre if _rc ssc install fre fre dx, desc //get counts next to "dx"s bys dx: egen mycount=count(dx) //collapse to one per group bys dx: keep if _n==1 //-sort- on count var sort mycount //take the last ten gen byte mostfreq=inrange(_n,`=_N-9',_N) //and back as we were expand mycount //see result ta myc mostfreq ************* HTH Martin -----Ursprüngliche Nachricht----- Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Cohen, Elan Gesendet: Montag, 16. November 2009 22:25 An: 'statalist@hsphsun2.harvard.edu' Betreff: st: Create a flag variable for 10 most frequent values Hi all, I have a string variable dx that represents a patient's diagnosis (about 5,000 unique values). I'd like to create a "top 10 flag" that equals 1ifdx is one of the top 10 most frequent diagnoses and 0 otherwise. I'm not even sure where to begin. If someone could point me in therightdirection, I'd be grateful. Stata 10, Windows XP Thank you, - Elan * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

