Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: Flagging most frequent occurrence |

Date |
Sat, 26 Oct 2013 10:19:58 +0100 |

Yes; indeed to two or more years if they both tie. The principle is independent of panel structure. If several years all have the highest count, they will all be flagged. . set obs 10 obs was 0, now 10 . gen year = 2000 + _n . replace year = 2001 in 2 (1 real change made) . replace year = 2010 in 9 (1 real change made) . tab year year | Freq. Percent Cum. ------------+----------------------------------- 2001 | 2 20.00 20.00 2003 | 1 10.00 30.00 2004 | 1 10.00 40.00 2005 | 1 10.00 50.00 2006 | 1 10.00 60.00 2007 | 1 10.00 70.00 2008 | 1 10.00 80.00 2010 | 2 20.00 100.00 ------------+----------------------------------- Total | 10 100.00 . bysort year : gen count = _N . su count, meanonly . gen ismode = count == r(max) . l +-----------------------+ | year count ismode | |-----------------------| 1. | 2001 2 1 | 2. | 2001 2 1 | 3. | 2003 1 0 | 4. | 2004 1 0 | 5. | 2005 1 0 | |-----------------------| 6. | 2006 1 0 | 7. | 2007 1 0 | 8. | 2008 1 0 | 9. | 2010 2 1 | 10. | 2010 2 1 | +-----------------------+ Nick njcoxstata@gmail.com On 25 October 2013 21:20, Steven Archambault <archstevej@gmail.com> wrote: > Thanks for the suggestions folks. I was not quite sure if my code > would catch the tie, or how that tie would work. I still do not really > know how the tie would be handled. Would the flag be given to both > years for a panel? > > replace flag=1 if most_freq==year > > On Thu, Oct 24, 2013 at 2:11 AM, Nick Cox <njcoxstata@gmail.com> wrote: >> The most frequent value is often called the mode, and that's a keyword >> to use in a -search-. In fact, -egen- has a -mode()- function, >> although it is easier here to avoid it. >> >> Maarten has given one solution, but he has flagged year(s) that occur >> most frequently in the dataset as a whole. Here is another solution >> that flags year(s) that occur most frequently within each panel, which >> Steven seems to be asking for. >> >> Note that Steven's >> >> replace flag=1 most_freq==year >> >> is lacking an -if-. >> >> My suggestion: >> >> bysort id year : gen count = _N >> bysort id (count) : gen ismode = count == count[_N] >> >> Under the hood, -egen- is most often doing stuff like this, using >> -by:-, sorting and heavy use of _n and _N and getting indicator >> variables out of true-or-false evaluations (1 is true and 0 is false). >> >> . In the second statement just above, we -sort- the values with the >> highest count to the end of each panel; then the modes are just the >> values with the highest count, and this works even if there are ties >> for -year-. >> >> I don't see why Steven's code isn't equivalent, assuming correction of >> the typo above. >> >> Nick >> njcoxstata@gmail.com >> >> >> On 24 October 2013 08:40, Maarten Buis <maartenlbuis@gmail.com> wrote: >> >>> On Thu, Oct 24, 2013 at 8:03 AM, Steven Archambault wrote: >>>> I have panel data, where observations occur in different years. I want >>>> to flag the year that occurs the most often. >>> >>> *------------------ begin example ------------------ >>> // input some example data >>> clear all >>> input /// >>> id year >>> 1 2008 >>> 1 2008 >>> 1 2009 >>> 2 2009 >>> 2 2009 >>> 2 2010 >>> 2 2010 >>> 3 2009 >>> 3 2009 >>> 3 2010 >>> end >>> >>> // compute the flag >>> bys year : gen flag = _N >>> sum flag, meanonly >>> replace flag = (flag == r(max)) >>> >>> // admire the result >>> sort id year >>> list, sepby(id) >>> *------------------- end example ------------------- >>> * (For more on examples I sent to the Statalist see: >>> * http://www.maartenbuis.nl/example_faq ) >>> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Flagging most frequent occurrence***From:*Steven Archambault <archstevej@gmail.com>

**Re: st: Flagging most frequent occurrence***From:*Maarten Buis <maartenlbuis@gmail.com>

**Re: st: Flagging most frequent occurrence***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Flagging most frequent occurrence***From:*Steven Archambault <archstevej@gmail.com>

- Prev by Date:
**st: Re: odbc load variable not found** - Next by Date:
**st: xtivreg and mediation** - Previous by thread:
**Re: st: Flagging most frequent occurrence** - Next by thread:
**st: Clarification requested about the at() option of -margins-** - Index(es):