Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Data management _flag on the basis of the frequency


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Data management _flag on the basis of the frequency
Date   Wed, 14 Mar 2012 00:33:07 +0000

Thanks for the gentle correction. In addition, my code assumes that a
-year- variable has been defined. Defining a -year- and -month-
variable as well may make the code a little clearer too.

* assuming -firm_id-, -director-, -date-

gen flag = 0
gen possible = 0
gen year = year(date)
gen month = month(date)

quietly forval i = 1/`=_N' {
       replace possible = firm_id == firm_id[`i'] & director ==
director[`i'] & inrange(year, year[`i'] - 2, year[`i']) & month ==
month[`i']
       tab year if possible
       if r(r) == 3 replace flag = 1 in `i'
}

On Tue, Mar 13, 2012 at 11:21 PM, Eric Booth <eric.a.booth@gmail.com> wrote:

> I think NJC's example for looping over the observations is instructive.  -reshape- can be slow with large datasets on machines with limited computing power and my coding is far less straightforward than his method.
>
> I adapted his code example to the example I showed earlier and there is one minor error in that the "year(date[`i']), year(date[`i']) -2)"  part of the inlist() command should be reversed (I should be sure to reiterate NJC's mention that he hadn't tested this code example).
>
> Our examples produce the same result if you remove this line from my example:
>
>  bys director_id month year: replace mark= . if _n != _N
>
> which recodes 'mark' to missing for these repeated values.  NJC's code marks all the duplicate years - mine did not due to the line above.  It's still not clear which the OP wanted, but this reconciles that difference in the outcome of our code.
>
> Finally, NJC's code does take into account the 'firm_id' which should be added to the -bysort- prefix part of several lines in my example.  My code works only because 'firm_id' is a constant in the example data.
>
> So, if you append this version of NJC's code to my earlier example, you'll get the same results with the two different approaches:
>
>
>
> *********************!
>
> **clean up over from my example
> **first, comment this out in my example above--
> ******bys director_id month year: replace mark= . if _n != _N
>    recode mark (.=0)
>
>
> ***NJC's example with minor corrections:
> gen flag = 0
> gen possible = 0
> quietly forval i = 1/`=_N' {
>
> replace possible =  firm_id == firm_id[`i'] & ///
>  director == director[`i'] &  ///
>  inrange(year(date),  year(date[`i']) - 2,  year(date[`i'])) ///
>  & month(date) == month(date[`i'])
>
>     tab year if possible
>   if r(r) == 3 replace flag = 1 in `i'
> }
>
> sort firm dir month year
> assert mark==flag  //assumes you commented out the last line in my example
> l if inlist(1, flag, mark)
> *********************!
>
> - Eric
>
> __
> Eric A. Booth
> Public Policy Research Institute
> Texas A&M University
> ebooth@ppri.tamu.edu
> +979.845.6754
>
>
> On Mar 13, 2012, at 2:30 PM, Nick Cox wrote:
>
>> Here's another way to approach it. I assume the existence of a daily
>> date variable -date-.
>>
>> gen flag = 0
>> gen possible = 0
>>
>> quietly forval i = 1/`=_N' {
>>       replace possible = firm_id == firm_id[`i'] & director ==
>> director[`i'] & inrange(year(date), year(date[`i']), year(date[`i']) -
>> 2) & month(date) == month(date[`i'])
>>
>>       tab year if possible
>>       if r(r) == 3 replace flag = 1 in `i'
>> }
>>
>> The idea is a loop over observations. The rules are (a) same firm (b)
>> same director (c) this year or previous two (d) same month. That
>> defines possibles. We still need to check that all three years are
>> represented in the possibles, which is true if a table of years has
>> precisely three rows.
>>
>> This is the often despised "loop over observations". Occasionally it
>> does yield code that is rather direct if also rather messy. It is not
>> very fast, but it is often relatively quick to code.
>>
>> Code not tested. For more discussion of the strategy, see
>>
>> SJ-7-3  pr0033  . . . . . . . . . . . . . .  Stata tip 51: Events in intervals
>>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>>        Q3/07   SJ 7(3):440--443                                 (no commands)
>>        tip for counting or summarizing irregularly spaced
>>        events in intervals
>>
>> which is accessible to all via the Stata Journal website.
>>
>> However, the rules don't seem complete. If there is a record in
>> 9/1998, 9/1999, 9/2000 shouldn't the last be flagged 1 similarly?
>>
>> Nick
>>
>> On Tue, Mar 13, 2012 at 12:55 PM, Agnieszka Trzeciakiewicz
>> <agazto@gmail.com> wrote:
>>
>>> I'm trying to mark an occurrence of an event in a particular firm, made by a
>>> particular director.
>>>
>>> The idea is to mark an occurrence with number 1, if there is an available
>>> record in the same month over the past three consecutive years.
>>> For instance if for a director A there is a record in 09/2000, 09/2001, and
>>> 09/2002 I would like to assign year 2000 with mark 0, year 2001 with mark 0
>>> and 2002 with mark 1.
>>>
>>> I have used Excel to solve my problem, and applied a relevant long formula .
>>> However, Excel broke down each time I run it. The original file has over
>>> 50000 records. Please find the table below.
>>>
>>> Is it possible to create ''mark'' column using STATA software and its data
>>> management codes?
>>> Thanks for your help.
>>> Best wishes,
>>> Agnieszka
>>>
>>> Table:
>>> full date       firm_id director_id     mark
>>> 22/10/2002      1       7       0
>>> 07/04/2003      1       7       0
>>> 11/04/2003      1       7       0
>>> 01/10/2003      1       7       0
>>> 01/10/2003      1       7       0
>>> 20/10/2003      1       7       0
>>> 07/04/2004      1       7       0
>>> 08/04/2004      1       7       0
>>> 16/04/2004      1       7       0
>>> 25/10/2004      1       7       1
>>> 12/11/2004      1       7       0
>>> 07/04/2005      1       7       1
>>> 27/04/2005      1       7       1
>>> 25/08/2005      1       7       0
>>> 05/09/2005      1       7       0
>>> 12/09/2005      1       7       0
>>> 25/10/2005      1       7       1
>>> 25/04/2006      1       7       1
>>> 25/04/2006      1       7       1
>>> 05/09/2006      1       7       0
>>> 05/09/2006      1       7       0
>>> 28/09/2006      1       7       0
>>> 28/09/2006      1       7       0
>>> 28/09/2006      1       7       0
>>> 28/09/2006      1       7       0
>>> 28/09/2006      1       7       0
>>> 25/10/2006      1       7       1
>>> 27/04/2007      1       7       1
>>> 27/04/2007      1       7       1
>>> 06/09/2007      1       7       1
>>> 06/09/2007      1       7       1
>>> 06/09/2007      1       7       1
>>> 06/09/2007      1       7       1
>>> 06/09/2007      1       7       1
>>> 26/10/2007      1       7       1
>>> 01/11/2007      1       7       0
>>> 31/03/2008      1       7       0
>>> 31/03/2008      1       7       0
>>> 24/04/2008      1       7       1
>>> 24/04/2008      1       7       1
>>> 03/09/2008      1       7       1
>>> 03/09/2008      1       7       1
>>> 16/09/2008      1       7       1
>>> 14/10/2008      1       7       1
>>> 07/04/2009      1       7       1
>>> 07/04/2009      1       7       1
>>> 07/09/2009      55      7       0
>>> 07/09/2009      55      7       0
>>> 07/09/2009      55      7       0
>>> 23/09/2009      55      7       0
>>> 23/09/2009      55      7       0
>>> 23/09/2009      1       7       1
>>> 08/10/2009      1       7       1
>>> 22/03/2010      1       7       0
>>> 22/03/2010      1       7       0
>>> 23/03/2010      1       7       0
>>> 23/03/2010      1       7       0
>>> 23/03/2010      1       7       0
>>> 12/04/2010      1       7       1
>>> 07/09/2010      1       7       1
>>> 07/09/2010      1       7       1
>>> 07/09/2010      1       7       1
>>> 07/09/2010      1       7       1
>>> 07/09/2010      1       7       1
>>> 10/09/2010      1       7       1
>>> 06/10/2010      1       7       1
>>> 27/05/2009      2       8       0
>>> 09/12/2009      2       8       0
>>> 05/02/2010      2       8       0
>>> 20/05/2010      2       8       0
>>> 23/08/2010      2       8       0
>>> 22/04/2008      3       10      0
>>> 18/12/2008      3       10      0
>>> 24/04/2009      3       10      0
>>> 24/04/2009      3       10      0
>>> 19/03/2010      3       10      0
>>> 19/03/2010      3       10      0
>>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index