[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <[email protected]> |

To |
<[email protected]> |

Subject |
RE: RE: st: AW: Create a flag variable for 10 most frequent values |

Date |
Tue, 17 Nov 2009 15:08:39 -0000 |

I agree with these criteria. In addition, a general solution to this should be able to tackle Missing values Weights Ties in frequency (e.g. there may not be exactly 10 modes) As promised earlier, here is an update of -modes- earlier published in the STB and the SJ. An update follows in the Stata Journal. *! NJC 1.4.0 17 November 2009 * NJC 1.3.0 13 May 2003 (SJ3-2: sg113_1) * NJC 1.2.0 15 June 1999 * NJC 1.1.2 23 December 1998 * NJC 1.1.1 29 October 1998 program modes, sort version 8.0 syntax varname [if] [in] [fweight aweight/] /// [ , Min(int 0) Nmodes(int 0) GENerate(str) ] if "`generate'" != "" { capture confirm new variable `generate' if _rc { di as err "generate() requires new variable name" exit _rc } } if `min' & `nmodes' { di as err "may not specify both min() and nmodes()" exit 198 } quietly { marksample touse, strok count if `touse' if r(N) == 0 error 2000 tempvar freq if "`exp'" == "" local exp = 1 bysort `touse' `varlist' : /// gen double `freq' = sum(`exp') * `touse' by `touse' `varlist' : /// replace `freq' = (_n == _N) * `freq'[_N] label var `freq' "Freq." if `min' > 0 { local which "`freq' >= `min'" } else if `nmodes' > 0 { sort `touse' `freq' `varlist' count if `freq' local nmodes = min(`nmodes', r(N)) local which "`freq' >= `freq'[_N - `nmodes' + 1]" } else { su `freq', meanonly local max = r(max) local which "`freq' == `max'" } count if `which' if r(N) == 0 { di as err "no such modes in data" exit 498 } } tabdisp `varlist' if `which', c(`freq') quietly if "`generate'" != "" { gen byte `generate' = `which' if `touse' bysort `touse' `varlist' (`generate') : /// replace `generate' = `generate'[_N] if `touse' } end ------------------------------------------------------------------------ help for modes (SJ9-4: sg113_2; SJ3-2: sg113_1) ------------------------------------------------------------------------ Tabulation of mode(s) modes varname [weight] [if exp] [in range] [ , { min(#) | nmodes(#) } generate(newvar) ] Description modes tabulates the mode(s) of varname, that is, the value(s) of varname that occur most frequently. varname may be numeric or string. fweights and aweights are allowed. Missing values are ignored. modes is most obviously useful with a discrete or categorical variable. Continuous variables may need to be placed in bins or classes first. Options min(#) specifies that all values with a frequency of # or more should be shown. nmodes(#) specifies that # modes should be shown. However, if ties in frequency make identification of precisely # modes arbitrary, all such tied modes will be shown. Note that fewer modes will be shown if fewer than # modes exist. min() and nmodes() may not be specified together. generate(newvar) generates an indicator variable that is missing if varlist is missing or observations are excluded by if or in, 1 whenever the value of varlist is one of the displayed modes, and 0 otherwise. Examples . modes rep78 . modes rep78 if foreign . modes mpg, min(3) . modes mpg, nmodes(3) . modes turn, nmodes(10) gen(flag) Author Nicholas J. Cox, Durham University, U.K. [email protected] Acknowledgments A problem posed by Sylvain Friederich led to the nmodes() option. A problem posed by Elan Cohen led to the generate() option. Also see STB: STB-50 sg113 Online: help for tabulate, kdensity, egen Nick [email protected] Martin Weiss As discussed last night between me and Sergiy: You want the whole dataset with all variables intact plus one that denotes membership in the "club of most frequent values of mpg"... [email protected] Suppose we need to flag the 5 most frequent values, how about the following typings? sysuse auto, clear keep mpg bys mpg: egen mycount=count(mpg) bys mycount: g num=_n gsort num -mycount g tag=_n<=5 bys mycount: egen rank5=max(tag) * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: RE: st: AW: Create a flag variable for 10 most frequent values***From:*"Cohen, Elan" <[email protected]>

**References**:

- Prev by Date:
**Re: st: Create a flag variable for 10 most frequent values** - Next by Date:
**RE: RE: st: AW: Create a flag variable for 10 most frequent values** - Previous by thread:
**AW: RE: st: AW: Create a flag variable for 10 most frequent values** - Next by thread:
**RE: RE: st: AW: Create a flag variable for 10 most frequent values** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |