Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Data management _flag on the basis of the frequency


From   Eric Booth <eric.a.booth@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Data management _flag on the basis of the frequency
Date   Tue, 13 Mar 2012 10:44:46 -0500

Yes, I did miss the "past 3 years" part (also, you said "consecutive" - I assume this condition still holds (?)).  

Following your example for director A -- How would I know to mark 2002 as the year with 2 prior, consecutive years[1]  and not 2003 (or better, shouldn't I mark 2002 and 2003 since they both have 2 consecutive years?) or 2004....  Choosing to mark 2002 over 2003 in your example seems arbitrary.

If you want to mark any observation with 2 prior, consecutive years, removing duplicate years (within the same director_id/month) makes this much much easier.  
However, if you do not want to remove duplicate years, the strategy I prefer is to -reshape- to wide, then mark your consecutive years and then -reshape- back to long format.  
Here's an example of both options (again, I mark ANY observation with 2 prior, consecutive years -- if there is some decision rule about why you picked 2002 and not 2003 & 2004 in your example, please make that clear or make changes to my code where directed below to take that decision rule into account):


********************!
clear
**data ex. shortened a bit:===>
inp str15 fulldate	firm_id	director_id	mark
"22/10/2002"	1	7	0
"07/04/2003"	1	7	0
"11/04/2003"	1	7	0
"01/10/2003"	1	7	0
"01/10/2003"	1	7	0
"20/10/2003"	1	7	0
"07/04/2004"	1	7	0
"08/04/2004"	1	7	0
"16/04/2004"	1	7	0
"25/10/2004"	1	7	1
"12/11/2004"	1	7	0
"07/04/2005"	1	7	1
"27/04/2005"	1	7	1
"25/08/2005"	1	7	0
"05/09/2005"	1	7	0
"12/09/2005"	1	7	0
"25/10/2005"	1	7	1
"25/04/2006"	1	7	1
"25/04/2006"	1	7	1
"05/09/2006"	1	7	0
"05/09/2006"	1	7	0
"28/09/2006"	1	7	0
"28/09/2006"	1	7	0
"28/09/2006"	1	7	0
"28/09/2006"	1	7	0
"28/09/2006"	1	7	0
"25/10/2006"	1	7	1
"27/04/2007"	1	7	1
"27/04/2007"	1	7	1
"06/09/2007"	1	7	1
"06/09/2007"	1	7	1
"06/09/2007"	1	7	1
"06/09/2007"	1	7	1
end
drop mark

**format dates
*see -help dates-
g date = date(fulldate, "DMY")
format date %td
g month = month(date)
g year = year(date)
drop fulldate

sort director_id  month  year



**first, duplicate years breaks this:
bys director_id month: g mark = 1 if  ///
	year[_n-1] == year-1 & year[_n-2] == year-2
drop mark


***********************
**Option 1: so removing duplicates makes this work:
preserve
	duplicates drop director_id month year, force
bys director_id month: g mark2 = 1 if  ///
	year[_n-1] == year-1 & year[_n-2] == year-2
recode mark2 (.=0)
l if director_id == 7 & month ==4 
restore




***********************
**Option 2: without removing duplicates:
bys director_id month year: g dups = _n

reshape wide date, i(firm_id director_id month year) j(dups)


**now you can decide which 3 consecutive years make sense to mark and reshape back:
bys director_id month: g mark = 1 if  ///
	year[_n-1] == year-1 & year[_n-2] == year-2
**insert some other rule about how to pick the year to look backward from**
**instead this marks any year with 2 prior, consecutive years by director id & month


**reshape back to long form:
reshape long date, i(firm_id director_id month year) j(dups)
drop if mi(date)

**mark only one obs of each director/month/year combo
bys director_id month year: replace mark= . if _n != _N
*********************!

- Eric

__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
ebooth@ppri.tamu.edu
Office: +979.845.6754






[1]  You say "past 3 years", but really it is 2 years prior to the year of interest

On Mar 13, 2012, at 10:05 AM, Agnieszka Trzeciakiewicz wrote:

> Eric, thank you for your response.
> The idea is to see if in the past 3 years there is a record in the same
> month. It does not matter which year is the last one. What matters is that
> there are 2 years prior to that record which included records from the same
> month.
> 
> Following my previous example, for a director A there is a record in
> 09/2000, 09/2001, 
> 09/2002, 09/2003, 09/2004 I would like to assign year 2000 with mark 0, year
> 2001 with mark 0 and 2002 with mark 1 and 2003 with mark zero. No matter how
> many transactions appeared in the same firm, made by the same director in a
> particular September. If however in September 2005 there is no transaction,
> and the same director of the same firm makes a transaction in September
> 2006, I would like to assign 0 to this record (as, there are no records from
> 3 past consecutive years).
> 
> Please note, that director A may work in 2 firms, therefore the command must
> be dependent on a firm as well.
> Thanks
> 
> 
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Eric Booth
> Sent: 13 March 2012 14:48
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: Data management _flag on the basis of the frequency
> 
> <>
> 
> 
> Since you've got repeat years (see below) that could be the last year that
> you want to mark with a 1 -- you need to decide if there is some kind of
> decision rule for which observation to mark.  So, for director "7" in month
> "4" your last year is 2007, but there are several observations.  You need to
> decide if there is some other variable that determines which is the last
> observation or perhaps -collapse- the data in some way to remove this
> duplication.  Without making any corrections for this issue, here is an
> example of how to mark your observations of interest.
> 
> ****************!
> clear
> **data ex. shortened a bit:===>
> inp str15 fulldate	firm_id	director_id	mark
> "22/10/2002"	1	7	0
> "07/04/2003"	1	7	0
> "11/04/2003"	1	7	0
> "01/10/2003"	1	7	0
> "01/10/2003"	1	7	0
> "20/10/2003"	1	7	0
> "07/04/2004"	1	7	0
> "08/04/2004"	1	7	0
> "16/04/2004"	1	7	0
> "25/10/2004"	1	7	1
> "12/11/2004"	1	7	0
> "07/04/2005"	1	7	1
> "27/04/2005"	1	7	1
> "25/08/2005"	1	7	0
> "05/09/2005"	1	7	0
> "12/09/2005"	1	7	0
> "25/10/2005"	1	7	1
> "25/04/2006"	1	7	1
> "25/04/2006"	1	7	1
> "05/09/2006"	1	7	0
> "05/09/2006"	1	7	0
> "28/09/2006"	1	7	0
> "28/09/2006"	1	7	0
> "28/09/2006"	1	7	0
> "28/09/2006"	1	7	0
> "28/09/2006"	1	7	0
> "25/10/2006"	1	7	1
> "27/04/2007"	1	7	1
> "27/04/2007"	1	7	1
> "06/09/2007"	1	7	1
> "06/09/2007"	1	7	1
> "06/09/2007"	1	7	1
> "06/09/2007"	1	7	1
> end
> drop mark
> 
> **format dates
> *see -help dates-
> g date = date(fulldate, "DMY")
> format date %td
> g month = month(date)
> g year = year(date)
> 
> 
> **create mark2
> sort director_id  month  year
> bys director_id  month : g mark2 = 1 if _n== _N recode mark2 (.=0)
> 
> **example:
> l if director_id == 7 & month ==4 //note the repeat years within april for
> director "7"
> ****************!
> 
> - Eric
> __
> Eric A. Booth
> Public Policy Research Institute
> Texas A&M University
> ebooth@ppri.tamu.edu
> Office: +979.845.6754
> 
> 
> On Mar 13, 2012, at 7:55 AM, Agnieszka Trzeciakiewicz wrote:
> 
>> Hi!
>> I'm trying to mark an occurrence of an event in a particular firm, 
>> made by a particular director.
>> 
>> The idea is to mark an occurrence with number 1, if there is an 
>> available record in the same month over the past three consecutive years.
>> For instance if for a director A there is a record in 09/2000, 
>> 09/2001, and
>> 09/2002 I would like to assign year 2000 with mark 0, year 2001 with 
>> mark 0 and 2002 with mark 1.
>> 
>> I have used Excel to solve my problem, and applied a relevant long formula
> .
>> However, Excel broke down each time I run it. The original file has 
>> over
>> 50000 records. Please find the table below.
>> 
>> Is it possible to create ''mark'' column using STATA software and its 
>> data management codes?
>> Thanks for your help.
>> Best wishes,
>> Agnieszka
>> 
>> Table:
>> full date	firm_id	director_id	mark
>> 22/10/2002	1	7	0
>> 07/04/2003	1	7	0
>> 11/04/2003	1	7	0
>> 01/10/2003	1	7	0
>> 01/10/2003	1	7	0
>> 20/10/2003	1	7	0
>> 07/04/2004	1	7	0
>> 08/04/2004	1	7	0
>> 16/04/2004	1	7	0
>> 25/10/2004	1	7	1
>> 12/11/2004	1	7	0
>> 07/04/2005	1	7	1
>> 27/04/2005	1	7	1
>> 25/08/2005	1	7	0
>> 05/09/2005	1	7	0
>> 12/09/2005	1	7	0
>> 25/10/2005	1	7	1
>> 25/04/2006	1	7	1
>> 25/04/2006	1	7	1
>> 05/09/2006	1	7	0
>> 05/09/2006	1	7	0
>> 28/09/2006	1	7	0
>> 28/09/2006	1	7	0
>> 28/09/2006	1	7	0
>> 28/09/2006	1	7	0
>> 28/09/2006	1	7	0
>> 25/10/2006	1	7	1
>> 27/04/2007	1	7	1
>> 27/04/2007	1	7	1
>> 06/09/2007	1	7	1
>> 06/09/2007	1	7	1
>> 06/09/2007	1	7	1
>> 06/09/2007	1	7	1
>> 06/09/2007	1	7	1
>> 26/10/2007	1	7	1
>> 01/11/2007	1	7	0
>> 31/03/2008	1	7	0
>> 31/03/2008	1	7	0
>> 24/04/2008	1	7	1
>> 24/04/2008	1	7	1
>> 03/09/2008	1	7	1
>> 03/09/2008	1	7	1
>> 16/09/2008	1	7	1
>> 14/10/2008	1	7	1
>> 07/04/2009	1	7	1
>> 07/04/2009	1	7	1
>> 07/09/2009	55	7	0
>> 07/09/2009	55	7	0
>> 07/09/2009	55	7	0
>> 23/09/2009	55	7	0
>> 23/09/2009	55	7	0
>> 23/09/2009	1	7	1
>> 08/10/2009	1	7	1
>> 22/03/2010	1	7	0
>> 22/03/2010	1	7	0
>> 23/03/2010	1	7	0
>> 23/03/2010	1	7	0
>> 23/03/2010	1	7	0
>> 12/04/2010	1	7	1
>> 07/09/2010	1	7	1
>> 07/09/2010	1	7	1
>> 07/09/2010	1	7	1
>> 07/09/2010	1	7	1
>> 07/09/2010	1	7	1
>> 10/09/2010	1	7	1
>> 06/10/2010	1	7	1
>> 27/05/2009	2	8	0
>> 09/12/2009	2	8	0
>> 05/02/2010	2	8	0
>> 20/05/2010	2	8	0
>> 23/08/2010	2	8	0
>> 22/04/2008	3	10	0
>> 18/12/2008	3	10	0
>> 24/04/2009	3	10	0
>> 24/04/2009	3	10	0
>> 19/03/2010	3	10	0
>> 19/03/2010	3	10	0
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index