Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Archambault <archstevej@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Flagging most frequent occurrence |

Date |
Fri, 25 Oct 2013 14:20:40 -0600 |

Thanks for the suggestions folks. I was not quite sure if my code would catch the tie, or how that tie would work. I still do not really know how the tie would be handled. Would the flag be given to both years for a panel? replace flag=1 if most_freq==year On Thu, Oct 24, 2013 at 2:11 AM, Nick Cox <njcoxstata@gmail.com> wrote: > The most frequent value is often called the mode, and that's a keyword > to use in a -search-. In fact, -egen- has a -mode()- function, > although it is easier here to avoid it. > > Maarten has given one solution, but he has flagged year(s) that occur > most frequently in the dataset as a whole. Here is another solution > that flags year(s) that occur most frequently within each panel, which > Steven seems to be asking for. > > Note that Steven's > > replace flag=1 most_freq==year > > is lacking an -if-. > > My suggestion: > > bysort id year : gen count = _N > bysort id (count) : gen ismode = count == count[_N] > > Under the hood, -egen- is most often doing stuff like this, using > -by:-, sorting and heavy use of _n and _N and getting indicator > variables out of true-or-false evaluations (1 is true and 0 is false). > > . In the second statement just above, we -sort- the values with the > highest count to the end of each panel; then the modes are just the > values with the highest count, and this works even if there are ties > for -year-. > > I don't see why Steven's code isn't equivalent, assuming correction of > the typo above. > > Nick > njcoxstata@gmail.com > > > On 24 October 2013 08:40, Maarten Buis <maartenlbuis@gmail.com> wrote: > >> On Thu, Oct 24, 2013 at 8:03 AM, Steven Archambault wrote: >>> I have panel data, where observations occur in different years. I want >>> to flag the year that occurs the most often. >> >> *------------------ begin example ------------------ >> // input some example data >> clear all >> input /// >> id year >> 1 2008 >> 1 2008 >> 1 2009 >> 2 2009 >> 2 2009 >> 2 2010 >> 2 2010 >> 3 2009 >> 3 2009 >> 3 2010 >> end >> >> // compute the flag >> bys year : gen flag = _N >> sum flag, meanonly >> replace flag = (flag == r(max)) >> >> // admire the result >> sort id year >> list, sepby(id) >> *------------------- end example ------------------- >> * (For more on examples I sent to the Statalist see: >> * http://www.maartenbuis.nl/example_faq ) >> > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Flagging most frequent occurrence***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: Flagging most frequent occurrence***From:*Steven Archambault <archstevej@gmail.com>

**Re: st: Flagging most frequent occurrence***From:*Maarten Buis <maartenlbuis@gmail.com>

**Re: st: Flagging most frequent occurrence***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: odbc load variable not found** - Next by Date:
**Re: st: -outreg-, factor variables and LaTeX special characters** - Previous by thread:
**Re: st: Flagging most frequent occurrence** - Next by thread:
**Re: st: Flagging most frequent occurrence** - Index(es):