Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Flagging most frequent occurrence


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Flagging most frequent occurrence
Date   Sat, 26 Oct 2013 10:19:58 +0100

Yes; indeed to two or more years if they both tie. The principle is
independent of panel structure. If several years all have the highest
count, they will all be flagged.

. set obs 10
obs was 0, now 10

. gen year = 2000 + _n

. replace year = 2001 in 2
(1 real change made)

. replace year = 2010 in 9
(1 real change made)

. tab year

       year |      Freq.     Percent        Cum.
------------+-----------------------------------
       2001 |          2       20.00       20.00
       2003 |          1       10.00       30.00
       2004 |          1       10.00       40.00
       2005 |          1       10.00       50.00
       2006 |          1       10.00       60.00
       2007 |          1       10.00       70.00
       2008 |          1       10.00       80.00
       2010 |          2       20.00      100.00
------------+-----------------------------------
      Total |         10      100.00

. bysort year : gen count = _N

. su count, meanonly

. gen ismode = count == r(max)

. l

     +-----------------------+
     | year   count   ismode |
     |-----------------------|
  1. | 2001       2        1 |
  2. | 2001       2        1 |
  3. | 2003       1        0 |
  4. | 2004       1        0 |
  5. | 2005       1        0 |
     |-----------------------|
  6. | 2006       1        0 |
  7. | 2007       1        0 |
  8. | 2008       1        0 |
  9. | 2010       2        1 |
 10. | 2010       2        1 |
     +-----------------------+

Nick
njcoxstata@gmail.com


On 25 October 2013 21:20, Steven Archambault <archstevej@gmail.com> wrote:
> Thanks for the suggestions folks. I was not quite sure if my code
> would catch the tie, or how that tie would work. I still do not really
> know how the tie would be handled. Would the flag be given to both
> years for a panel?
>
> replace flag=1 if most_freq==year
>
> On Thu, Oct 24, 2013 at 2:11 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> The most frequent value is often called the mode, and that's a keyword
>> to use in a -search-. In fact, -egen- has a -mode()- function,
>> although it is easier here to avoid it.
>>
>> Maarten has given one solution, but he has flagged year(s) that occur
>> most frequently in the dataset as a whole. Here is another solution
>> that flags year(s) that occur most frequently within each panel, which
>> Steven seems to be asking for.
>>
>> Note that Steven's
>>
>> replace flag=1 most_freq==year
>>
>> is lacking an -if-.
>>
>> My suggestion:
>>
>> bysort id year : gen count = _N
>> bysort id (count) : gen ismode = count == count[_N]
>>
>> Under the hood, -egen- is most often doing stuff like this, using
>> -by:-, sorting and heavy use of _n and _N and getting indicator
>> variables out of true-or-false evaluations (1 is true and 0 is false).
>>
>> . In the second statement just above, we -sort- the values with the
>> highest count to the end of each panel; then the modes are just the
>> values with the highest count, and this works even if there are ties
>> for -year-.
>>
>> I don't see why Steven's code isn't equivalent, assuming correction of
>> the typo above.
>>
>> Nick
>> njcoxstata@gmail.com
>>
>>
>> On 24 October 2013 08:40, Maarten Buis <maartenlbuis@gmail.com> wrote:
>>
>>> On Thu, Oct 24, 2013 at 8:03 AM, Steven Archambault wrote:
>>>> I have panel data, where observations occur in different years. I want
>>>> to flag the year that occurs the most often.
>>>
>>> *------------------ begin example ------------------
>>> // input some example data
>>> clear all
>>> input ///
>>> id year
>>> 1 2008
>>> 1 2008
>>> 1 2009
>>> 2 2009
>>> 2 2009
>>> 2 2010
>>> 2 2010
>>> 3 2009
>>> 3 2009
>>> 3 2010
>>> end
>>>
>>> // compute the flag
>>> bys year : gen flag = _N
>>> sum flag, meanonly
>>> replace flag = (flag == r(max))
>>>
>>> // admire the result
>>> sort id year
>>> list, sepby(id)
>>> *------------------- end example -------------------
>>> * (For more on examples I sent to the Statalist see:
>>> * http://www.maartenbuis.nl/example_faq )
>>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index