[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Cohen, Elan" <cohened@upmc.edu> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: RE: st: AW: Create a flag variable for 10 most frequent values |

Date |
Tue, 17 Nov 2009 10:16:58 -0500 |

Thank you everyone. I had just finished writing a solution similar to Jeph's but without the generalizations Nick's solution offers. -nmodes- will definitely do the trick. Thanks again, - Elan > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox > Sent: Tuesday, November 17, 2009 10:09 AM > To: statalist@hsphsun2.harvard.edu > Subject: RE: RE: st: AW: Create a flag variable for 10 most > frequent values > > I agree with these criteria. In addition, a general solution to this > should be able to tackle > > Missing values > Weights > Ties in frequency (e.g. there may not be exactly 10 modes) > > As promised earlier, here is an update of -modes- earlier published in > the STB and the SJ. An update follows in the Stata Journal. > > *! NJC 1.4.0 17 November 2009 > * NJC 1.3.0 13 May 2003 (SJ3-2: sg113_1) > * NJC 1.2.0 15 June 1999 > * NJC 1.1.2 23 December 1998 > * NJC 1.1.1 29 October 1998 > program modes, sort > version 8.0 > syntax varname [if] [in] [fweight aweight/] /// > [ , Min(int 0) Nmodes(int 0) GENerate(str) ] > > if "`generate'" != "" { > capture confirm new variable `generate' > if _rc { > di as err "generate() requires new variable > name" > exit _rc > } > } > > if `min' & `nmodes' { > di as err "may not specify both min() and nmodes()" > exit 198 > } > > quietly { > marksample touse, strok > count if `touse' > if r(N) == 0 error 2000 > > tempvar freq > if "`exp'" == "" local exp = 1 > bysort `touse' `varlist' : /// > gen double `freq' = sum(`exp') * `touse' > by `touse' `varlist' : /// > replace `freq' = (_n == _N) * `freq'[_N] > label var `freq' "Freq." > > if `min' > 0 { > local which "`freq' >= `min'" > } > else if `nmodes' > 0 { > sort `touse' `freq' `varlist' > count if `freq' > local nmodes = min(`nmodes', r(N)) > local which "`freq' >= `freq'[_N - `nmodes' + > 1]" > } > else { > su `freq', meanonly > local max = r(max) > local which "`freq' == `max'" > } > > count if `which' > if r(N) == 0 { > di as err "no such modes in data" > exit 498 > } > } > > tabdisp `varlist' if `which', c(`freq') > > quietly if "`generate'" != "" { > gen byte `generate' = `which' if `touse' > bysort `touse' `varlist' (`generate') : /// > replace `generate' = `generate'[_N] if `touse' > } > > end > > > -------------------------------------------------------------- > ---------- > help for modes (SJ9-4: sg113_2; > SJ3-2: sg113_1) > -------------------------------------------------------------- > ---------- > > Tabulation of mode(s) > > modes varname [weight] [if exp] [in range] [ , { min(#) | > nmodes(#) } generate(newvar) ] > > > Description > > modes tabulates the mode(s) of varname, that is, the value(s) of > varname that occur most frequently. varname may be numeric or > string. fweights and aweights are allowed. Missing values are > ignored. > > modes is most obviously useful with a discrete or categorical > variable. Continuous variables may need to be placed in bins or > classes first. > > > Options > > min(#) specifies that all values with a frequency of # or more > should be shown. > > nmodes(#) specifies that # modes should be shown. However, if ties > in frequency make identification of precisely # modes > arbitrary, all such tied modes will be shown. Note that fewer > modes will be shown if fewer than # modes exist. > > min() and nmodes() may not be specified together. > > generate(newvar) generates an indicator variable that is missing > if varlist is missing or observations are excluded by if or > in, 1 whenever the value of varlist is one of the displayed > modes, and 0 otherwise. > > > Examples > > . modes rep78 > . modes rep78 if foreign > . modes mpg, min(3) > . modes mpg, nmodes(3) > . modes turn, nmodes(10) gen(flag) > > > Author > > Nicholas J. Cox, Durham University, U.K. > n.j.cox@durham.ac.uk > > > Acknowledgments > > A problem posed by Sylvain Friederich led to the nmodes() option. > A problem posed by Elan Cohen led to the generate() option. > > > Also see > > STB: STB-50 sg113 > Online: help for tabulate, kdensity, egen > > Nick > n.j.cox@durham.ac.uk > > Martin Weiss > > As discussed last night between me and Sergiy: You want the whole > dataset > with all variables intact plus one that denotes membership in > the "club > of > most frequent values of mpg"... > > gjhxmu@sina.com > > Suppose we need to flag the 5 most frequent values, how about the > following > typings? > > sysuse auto, clear > keep mpg > bys mpg: egen mycount=count(mpg) > bys mycount: g num=_n > gsort num -mycount > g tag=_n<=5 > bys mycount: egen rank5=max(tag) > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: RE: st: AW: Create a flag variable for 10 most frequent values***From:*gjhxmu@sina.com

**AW: RE: st: AW: Create a flag variable for 10 most frequent values***From:*"Martin Weiss" <martin.weiss1@gmx.de>

**RE: RE: st: AW: Create a flag variable for 10 most frequent values***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**RE: RE: st: AW: Create a flag variable for 10 most frequent values** - Next by Date:
**st: Partially stacked bar graph?** - Previous by thread:
**RE: RE: st: AW: Create a flag variable for 10 most frequent values** - Next by thread:
**st: Partially stacked bar graph?** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |