Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Data management _flag on the basis of the frequency
From
Eric Booth <[email protected]>
To
[email protected]
Subject
Re: st: Data management _flag on the basis of the frequency
Date
Tue, 13 Mar 2012 10:44:46 -0500
Yes, I did miss the "past 3 years" part (also, you said "consecutive" - I assume this condition still holds (?)).
Following your example for director A -- How would I know to mark 2002 as the year with 2 prior, consecutive years[1] and not 2003 (or better, shouldn't I mark 2002 and 2003 since they both have 2 consecutive years?) or 2004.... Choosing to mark 2002 over 2003 in your example seems arbitrary.
If you want to mark any observation with 2 prior, consecutive years, removing duplicate years (within the same director_id/month) makes this much much easier.
However, if you do not want to remove duplicate years, the strategy I prefer is to -reshape- to wide, then mark your consecutive years and then -reshape- back to long format.
Here's an example of both options (again, I mark ANY observation with 2 prior, consecutive years -- if there is some decision rule about why you picked 2002 and not 2003 & 2004 in your example, please make that clear or make changes to my code where directed below to take that decision rule into account):
********************!
clear
**data ex. shortened a bit:===>
inp str15 fulldate firm_id director_id mark
"22/10/2002" 1 7 0
"07/04/2003" 1 7 0
"11/04/2003" 1 7 0
"01/10/2003" 1 7 0
"01/10/2003" 1 7 0
"20/10/2003" 1 7 0
"07/04/2004" 1 7 0
"08/04/2004" 1 7 0
"16/04/2004" 1 7 0
"25/10/2004" 1 7 1
"12/11/2004" 1 7 0
"07/04/2005" 1 7 1
"27/04/2005" 1 7 1
"25/08/2005" 1 7 0
"05/09/2005" 1 7 0
"12/09/2005" 1 7 0
"25/10/2005" 1 7 1
"25/04/2006" 1 7 1
"25/04/2006" 1 7 1
"05/09/2006" 1 7 0
"05/09/2006" 1 7 0
"28/09/2006" 1 7 0
"28/09/2006" 1 7 0
"28/09/2006" 1 7 0
"28/09/2006" 1 7 0
"28/09/2006" 1 7 0
"25/10/2006" 1 7 1
"27/04/2007" 1 7 1
"27/04/2007" 1 7 1
"06/09/2007" 1 7 1
"06/09/2007" 1 7 1
"06/09/2007" 1 7 1
"06/09/2007" 1 7 1
end
drop mark
**format dates
*see -help dates-
g date = date(fulldate, "DMY")
format date %td
g month = month(date)
g year = year(date)
drop fulldate
sort director_id month year
**first, duplicate years breaks this:
bys director_id month: g mark = 1 if ///
year[_n-1] == year-1 & year[_n-2] == year-2
drop mark
***********************
**Option 1: so removing duplicates makes this work:
preserve
duplicates drop director_id month year, force
bys director_id month: g mark2 = 1 if ///
year[_n-1] == year-1 & year[_n-2] == year-2
recode mark2 (.=0)
l if director_id == 7 & month ==4
restore
***********************
**Option 2: without removing duplicates:
bys director_id month year: g dups = _n
reshape wide date, i(firm_id director_id month year) j(dups)
**now you can decide which 3 consecutive years make sense to mark and reshape back:
bys director_id month: g mark = 1 if ///
year[_n-1] == year-1 & year[_n-2] == year-2
**insert some other rule about how to pick the year to look backward from**
**instead this marks any year with 2 prior, consecutive years by director id & month
**reshape back to long form:
reshape long date, i(firm_id director_id month year) j(dups)
drop if mi(date)
**mark only one obs of each director/month/year combo
bys director_id month year: replace mark= . if _n != _N
*********************!
- Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754
[1] You say "past 3 years", but really it is 2 years prior to the year of interest
On Mar 13, 2012, at 10:05 AM, Agnieszka Trzeciakiewicz wrote:
> Eric, thank you for your response.
> The idea is to see if in the past 3 years there is a record in the same
> month. It does not matter which year is the last one. What matters is that
> there are 2 years prior to that record which included records from the same
> month.
>
> Following my previous example, for a director A there is a record in
> 09/2000, 09/2001,
> 09/2002, 09/2003, 09/2004 I would like to assign year 2000 with mark 0, year
> 2001 with mark 0 and 2002 with mark 1 and 2003 with mark zero. No matter how
> many transactions appeared in the same firm, made by the same director in a
> particular September. If however in September 2005 there is no transaction,
> and the same director of the same firm makes a transaction in September
> 2006, I would like to assign 0 to this record (as, there are no records from
> 3 past consecutive years).
>
> Please note, that director A may work in 2 firms, therefore the command must
> be dependent on a firm as well.
> Thanks
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Eric Booth
> Sent: 13 March 2012 14:48
> To: [email protected]
> Subject: Re: st: Data management _flag on the basis of the frequency
>
> <>
>
>
> Since you've got repeat years (see below) that could be the last year that
> you want to mark with a 1 -- you need to decide if there is some kind of
> decision rule for which observation to mark. So, for director "7" in month
> "4" your last year is 2007, but there are several observations. You need to
> decide if there is some other variable that determines which is the last
> observation or perhaps -collapse- the data in some way to remove this
> duplication. Without making any corrections for this issue, here is an
> example of how to mark your observations of interest.
>
> ****************!
> clear
> **data ex. shortened a bit:===>
> inp str15 fulldate firm_id director_id mark
> "22/10/2002" 1 7 0
> "07/04/2003" 1 7 0
> "11/04/2003" 1 7 0
> "01/10/2003" 1 7 0
> "01/10/2003" 1 7 0
> "20/10/2003" 1 7 0
> "07/04/2004" 1 7 0
> "08/04/2004" 1 7 0
> "16/04/2004" 1 7 0
> "25/10/2004" 1 7 1
> "12/11/2004" 1 7 0
> "07/04/2005" 1 7 1
> "27/04/2005" 1 7 1
> "25/08/2005" 1 7 0
> "05/09/2005" 1 7 0
> "12/09/2005" 1 7 0
> "25/10/2005" 1 7 1
> "25/04/2006" 1 7 1
> "25/04/2006" 1 7 1
> "05/09/2006" 1 7 0
> "05/09/2006" 1 7 0
> "28/09/2006" 1 7 0
> "28/09/2006" 1 7 0
> "28/09/2006" 1 7 0
> "28/09/2006" 1 7 0
> "28/09/2006" 1 7 0
> "25/10/2006" 1 7 1
> "27/04/2007" 1 7 1
> "27/04/2007" 1 7 1
> "06/09/2007" 1 7 1
> "06/09/2007" 1 7 1
> "06/09/2007" 1 7 1
> "06/09/2007" 1 7 1
> end
> drop mark
>
> **format dates
> *see -help dates-
> g date = date(fulldate, "DMY")
> format date %td
> g month = month(date)
> g year = year(date)
>
>
> **create mark2
> sort director_id month year
> bys director_id month : g mark2 = 1 if _n== _N recode mark2 (.=0)
>
> **example:
> l if director_id == 7 & month ==4 //note the repeat years within april for
> director "7"
> ****************!
>
> - Eric
> __
> Eric A. Booth
> Public Policy Research Institute
> Texas A&M University
> [email protected]
> Office: +979.845.6754
>
>
> On Mar 13, 2012, at 7:55 AM, Agnieszka Trzeciakiewicz wrote:
>
>> Hi!
>> I'm trying to mark an occurrence of an event in a particular firm,
>> made by a particular director.
>>
>> The idea is to mark an occurrence with number 1, if there is an
>> available record in the same month over the past three consecutive years.
>> For instance if for a director A there is a record in 09/2000,
>> 09/2001, and
>> 09/2002 I would like to assign year 2000 with mark 0, year 2001 with
>> mark 0 and 2002 with mark 1.
>>
>> I have used Excel to solve my problem, and applied a relevant long formula
> .
>> However, Excel broke down each time I run it. The original file has
>> over
>> 50000 records. Please find the table below.
>>
>> Is it possible to create ''mark'' column using STATA software and its
>> data management codes?
>> Thanks for your help.
>> Best wishes,
>> Agnieszka
>>
>> Table:
>> full date firm_id director_id mark
>> 22/10/2002 1 7 0
>> 07/04/2003 1 7 0
>> 11/04/2003 1 7 0
>> 01/10/2003 1 7 0
>> 01/10/2003 1 7 0
>> 20/10/2003 1 7 0
>> 07/04/2004 1 7 0
>> 08/04/2004 1 7 0
>> 16/04/2004 1 7 0
>> 25/10/2004 1 7 1
>> 12/11/2004 1 7 0
>> 07/04/2005 1 7 1
>> 27/04/2005 1 7 1
>> 25/08/2005 1 7 0
>> 05/09/2005 1 7 0
>> 12/09/2005 1 7 0
>> 25/10/2005 1 7 1
>> 25/04/2006 1 7 1
>> 25/04/2006 1 7 1
>> 05/09/2006 1 7 0
>> 05/09/2006 1 7 0
>> 28/09/2006 1 7 0
>> 28/09/2006 1 7 0
>> 28/09/2006 1 7 0
>> 28/09/2006 1 7 0
>> 28/09/2006 1 7 0
>> 25/10/2006 1 7 1
>> 27/04/2007 1 7 1
>> 27/04/2007 1 7 1
>> 06/09/2007 1 7 1
>> 06/09/2007 1 7 1
>> 06/09/2007 1 7 1
>> 06/09/2007 1 7 1
>> 06/09/2007 1 7 1
>> 26/10/2007 1 7 1
>> 01/11/2007 1 7 0
>> 31/03/2008 1 7 0
>> 31/03/2008 1 7 0
>> 24/04/2008 1 7 1
>> 24/04/2008 1 7 1
>> 03/09/2008 1 7 1
>> 03/09/2008 1 7 1
>> 16/09/2008 1 7 1
>> 14/10/2008 1 7 1
>> 07/04/2009 1 7 1
>> 07/04/2009 1 7 1
>> 07/09/2009 55 7 0
>> 07/09/2009 55 7 0
>> 07/09/2009 55 7 0
>> 23/09/2009 55 7 0
>> 23/09/2009 55 7 0
>> 23/09/2009 1 7 1
>> 08/10/2009 1 7 1
>> 22/03/2010 1 7 0
>> 22/03/2010 1 7 0
>> 23/03/2010 1 7 0
>> 23/03/2010 1 7 0
>> 23/03/2010 1 7 0
>> 12/04/2010 1 7 1
>> 07/09/2010 1 7 1
>> 07/09/2010 1 7 1
>> 07/09/2010 1 7 1
>> 07/09/2010 1 7 1
>> 07/09/2010 1 7 1
>> 10/09/2010 1 7 1
>> 06/10/2010 1 7 1
>> 27/05/2009 2 8 0
>> 09/12/2009 2 8 0
>> 05/02/2010 2 8 0
>> 20/05/2010 2 8 0
>> 23/08/2010 2 8 0
>> 22/04/2008 3 10 0
>> 18/12/2008 3 10 0
>> 24/04/2009 3 10 0
>> 24/04/2009 3 10 0
>> 19/03/2010 3 10 0
>> 19/03/2010 3 10 0
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/