Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: Flagging most frequent occurrence |

Date |
Thu, 24 Oct 2013 09:11:06 +0100 |

The most frequent value is often called the mode, and that's a keyword to use in a -search-. In fact, -egen- has a -mode()- function, although it is easier here to avoid it. Maarten has given one solution, but he has flagged year(s) that occur most frequently in the dataset as a whole. Here is another solution that flags year(s) that occur most frequently within each panel, which Steven seems to be asking for. Note that Steven's replace flag=1 most_freq==year is lacking an -if-. My suggestion: bysort id year : gen count = _N bysort id (count) : gen ismode = count == count[_N] Under the hood, -egen- is most often doing stuff like this, using -by:-, sorting and heavy use of _n and _N and getting indicator variables out of true-or-false evaluations (1 is true and 0 is false). . In the second statement just above, we -sort- the values with the highest count to the end of each panel; then the modes are just the values with the highest count, and this works even if there are ties for -year-. I don't see why Steven's code isn't equivalent, assuming correction of the typo above. Nick njcoxstata@gmail.com On 24 October 2013 08:40, Maarten Buis <maartenlbuis@gmail.com> wrote: > On Thu, Oct 24, 2013 at 8:03 AM, Steven Archambault wrote: >> I have panel data, where observations occur in different years. I want >> to flag the year that occurs the most often. > > *------------------ begin example ------------------ > // input some example data > clear all > input /// > id year > 1 2008 > 1 2008 > 1 2009 > 2 2009 > 2 2009 > 2 2010 > 2 2010 > 3 2009 > 3 2009 > 3 2010 > end > > // compute the flag > bys year : gen flag = _N > sum flag, meanonly > replace flag = (flag == r(max)) > > // admire the result > sort id year > list, sepby(id) > *------------------- end example ------------------- > * (For more on examples I sent to the Statalist see: > * http://www.maartenbuis.nl/example_faq ) > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Flagging most frequent occurrence***From:*Steven Archambault <archstevej@gmail.com>

**References**:**st: Flagging most frequent occurrence***From:*Steven Archambault <archstevej@gmail.com>

**Re: st: Flagging most frequent occurrence***From:*Maarten Buis <maartenlbuis@gmail.com>

- Prev by Date:
**st: graph command for interaction** - Next by Date:
**st: RE: data management: looking for the shortest way** - Previous by thread:
**Re: st: Flagging most frequent occurrence** - Next by thread:
**Re: st: Flagging most frequent occurrence** - Index(es):