Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Flagging most frequent occurrence

From	Steven Archambault <[email protected]>
To	[email protected]
Subject	Re: st: Flagging most frequent occurrence
Date	Fri, 25 Oct 2013 14:20:40 -0600

Thanks for the suggestions folks. I was not quite sure if my code
would catch the tie, or how that tie would work. I still do not really
know how the tie would be handled. Would the flag be given to both
years for a panel?

replace flag=1 if most_freq==year

On Thu, Oct 24, 2013 at 2:11 AM, Nick Cox <[email protected]> wrote:
> The most frequent value is often called the mode, and that's a keyword
> to use in a -search-. In fact, -egen- has a -mode()- function,
> although it is easier here to avoid it.
>
> Maarten has given one solution, but he has flagged year(s) that occur
> most frequently in the dataset as a whole. Here is another solution
> that flags year(s) that occur most frequently within each panel, which
> Steven seems to be asking for.
>
> Note that Steven's
>
> replace flag=1 most_freq==year
>
> is lacking an -if-.
>
> My suggestion:
>
> bysort id year : gen count = _N
> bysort id (count) : gen ismode = count == count[_N]
>
> Under the hood, -egen- is most often doing stuff like this, using
> -by:-, sorting and heavy use of _n and _N and getting indicator
> variables out of true-or-false evaluations (1 is true and 0 is false).
>
> . In the second statement just above, we -sort- the values with the
> highest count to the end of each panel; then the modes are just the
> values with the highest count, and this works even if there are ties
> for -year-.
>
> I don't see why Steven's code isn't equivalent, assuming correction of
> the typo above.
>
> Nick
> [email protected]
>
>
> On 24 October 2013 08:40, Maarten Buis <[email protected]> wrote:
>
>> On Thu, Oct 24, 2013 at 8:03 AM, Steven Archambault wrote:
>>> I have panel data, where observations occur in different years. I want
>>> to flag the year that occurs the most often.
>>
>> *------------------ begin example ------------------
>> // input some example data
>> clear all
>> input ///
>> id year
>> 1 2008
>> 1 2008
>> 1 2009
>> 2 2009
>> 2 2009
>> 2 2010
>> 2 2010
>> 3 2009
>> 3 2009
>> 3 2010
>> end
>>
>> // compute the flag
>> bys year : gen flag = _N
>> sum flag, meanonly
>> replace flag = (flag == r(max))
>>
>> // admire the result
>> sort id year
>> list, sepby(id)
>> *------------------- end example -------------------
>> * (For more on examples I sent to the Statalist see:
>> * http://www.maartenbuis.nl/example_faq )
>>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Flagging most frequent occurrence
  - From: Nick Cox <[email protected]>

References:
- st: Flagging most frequent occurrence
  - From: Steven Archambault <[email protected]>
- Re: st: Flagging most frequent occurrence
  - From: Maarten Buis <[email protected]>
- Re: st: Flagging most frequent occurrence
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: odbc load variable not found
Next by Date: Re: st: -outreg-, factor variables and LaTeX special characters
Previous by thread: Re: st: Flagging most frequent occurrence
Next by thread: Re: st: Flagging most frequent occurrence
Index(es):
- Date
- Thread