Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Eric Booth <eric.a.booth@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Data management _flag on the basis of the frequency |
Date | Tue, 13 Mar 2012 10:44:46 -0500 |
Yes, I did miss the "past 3 years" part (also, you said "consecutive" - I assume this condition still holds (?)). Following your example for director A -- How would I know to mark 2002 as the year with 2 prior, consecutive years[1] and not 2003 (or better, shouldn't I mark 2002 and 2003 since they both have 2 consecutive years?) or 2004.... Choosing to mark 2002 over 2003 in your example seems arbitrary. If you want to mark any observation with 2 prior, consecutive years, removing duplicate years (within the same director_id/month) makes this much much easier. However, if you do not want to remove duplicate years, the strategy I prefer is to -reshape- to wide, then mark your consecutive years and then -reshape- back to long format. Here's an example of both options (again, I mark ANY observation with 2 prior, consecutive years -- if there is some decision rule about why you picked 2002 and not 2003 & 2004 in your example, please make that clear or make changes to my code where directed below to take that decision rule into account): ********************! clear **data ex. shortened a bit:===> inp str15 fulldate firm_id director_id mark "22/10/2002" 1 7 0 "07/04/2003" 1 7 0 "11/04/2003" 1 7 0 "01/10/2003" 1 7 0 "01/10/2003" 1 7 0 "20/10/2003" 1 7 0 "07/04/2004" 1 7 0 "08/04/2004" 1 7 0 "16/04/2004" 1 7 0 "25/10/2004" 1 7 1 "12/11/2004" 1 7 0 "07/04/2005" 1 7 1 "27/04/2005" 1 7 1 "25/08/2005" 1 7 0 "05/09/2005" 1 7 0 "12/09/2005" 1 7 0 "25/10/2005" 1 7 1 "25/04/2006" 1 7 1 "25/04/2006" 1 7 1 "05/09/2006" 1 7 0 "05/09/2006" 1 7 0 "28/09/2006" 1 7 0 "28/09/2006" 1 7 0 "28/09/2006" 1 7 0 "28/09/2006" 1 7 0 "28/09/2006" 1 7 0 "25/10/2006" 1 7 1 "27/04/2007" 1 7 1 "27/04/2007" 1 7 1 "06/09/2007" 1 7 1 "06/09/2007" 1 7 1 "06/09/2007" 1 7 1 "06/09/2007" 1 7 1 end drop mark **format dates *see -help dates- g date = date(fulldate, "DMY") format date %td g month = month(date) g year = year(date) drop fulldate sort director_id month year **first, duplicate years breaks this: bys director_id month: g mark = 1 if /// year[_n-1] == year-1 & year[_n-2] == year-2 drop mark *********************** **Option 1: so removing duplicates makes this work: preserve duplicates drop director_id month year, force bys director_id month: g mark2 = 1 if /// year[_n-1] == year-1 & year[_n-2] == year-2 recode mark2 (.=0) l if director_id == 7 & month ==4 restore *********************** **Option 2: without removing duplicates: bys director_id month year: g dups = _n reshape wide date, i(firm_id director_id month year) j(dups) **now you can decide which 3 consecutive years make sense to mark and reshape back: bys director_id month: g mark = 1 if /// year[_n-1] == year-1 & year[_n-2] == year-2 **insert some other rule about how to pick the year to look backward from** **instead this marks any year with 2 prior, consecutive years by director id & month **reshape back to long form: reshape long date, i(firm_id director_id month year) j(dups) drop if mi(date) **mark only one obs of each director/month/year combo bys director_id month year: replace mark= . if _n != _N *********************! - Eric __ Eric A. Booth Public Policy Research Institute Texas A&M University ebooth@ppri.tamu.edu Office: +979.845.6754 [1] You say "past 3 years", but really it is 2 years prior to the year of interest On Mar 13, 2012, at 10:05 AM, Agnieszka Trzeciakiewicz wrote: > Eric, thank you for your response. > The idea is to see if in the past 3 years there is a record in the same > month. It does not matter which year is the last one. What matters is that > there are 2 years prior to that record which included records from the same > month. > > Following my previous example, for a director A there is a record in > 09/2000, 09/2001, > 09/2002, 09/2003, 09/2004 I would like to assign year 2000 with mark 0, year > 2001 with mark 0 and 2002 with mark 1 and 2003 with mark zero. No matter how > many transactions appeared in the same firm, made by the same director in a > particular September. If however in September 2005 there is no transaction, > and the same director of the same firm makes a transaction in September > 2006, I would like to assign 0 to this record (as, there are no records from > 3 past consecutive years). > > Please note, that director A may work in 2 firms, therefore the command must > be dependent on a firm as well. > Thanks > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Eric Booth > Sent: 13 March 2012 14:48 > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: Data management _flag on the basis of the frequency > > <> > > > Since you've got repeat years (see below) that could be the last year that > you want to mark with a 1 -- you need to decide if there is some kind of > decision rule for which observation to mark. So, for director "7" in month > "4" your last year is 2007, but there are several observations. You need to > decide if there is some other variable that determines which is the last > observation or perhaps -collapse- the data in some way to remove this > duplication. Without making any corrections for this issue, here is an > example of how to mark your observations of interest. > > ****************! > clear > **data ex. shortened a bit:===> > inp str15 fulldate firm_id director_id mark > "22/10/2002" 1 7 0 > "07/04/2003" 1 7 0 > "11/04/2003" 1 7 0 > "01/10/2003" 1 7 0 > "01/10/2003" 1 7 0 > "20/10/2003" 1 7 0 > "07/04/2004" 1 7 0 > "08/04/2004" 1 7 0 > "16/04/2004" 1 7 0 > "25/10/2004" 1 7 1 > "12/11/2004" 1 7 0 > "07/04/2005" 1 7 1 > "27/04/2005" 1 7 1 > "25/08/2005" 1 7 0 > "05/09/2005" 1 7 0 > "12/09/2005" 1 7 0 > "25/10/2005" 1 7 1 > "25/04/2006" 1 7 1 > "25/04/2006" 1 7 1 > "05/09/2006" 1 7 0 > "05/09/2006" 1 7 0 > "28/09/2006" 1 7 0 > "28/09/2006" 1 7 0 > "28/09/2006" 1 7 0 > "28/09/2006" 1 7 0 > "28/09/2006" 1 7 0 > "25/10/2006" 1 7 1 > "27/04/2007" 1 7 1 > "27/04/2007" 1 7 1 > "06/09/2007" 1 7 1 > "06/09/2007" 1 7 1 > "06/09/2007" 1 7 1 > "06/09/2007" 1 7 1 > end > drop mark > > **format dates > *see -help dates- > g date = date(fulldate, "DMY") > format date %td > g month = month(date) > g year = year(date) > > > **create mark2 > sort director_id month year > bys director_id month : g mark2 = 1 if _n== _N recode mark2 (.=0) > > **example: > l if director_id == 7 & month ==4 //note the repeat years within april for > director "7" > ****************! > > - Eric > __ > Eric A. Booth > Public Policy Research Institute > Texas A&M University > ebooth@ppri.tamu.edu > Office: +979.845.6754 > > > On Mar 13, 2012, at 7:55 AM, Agnieszka Trzeciakiewicz wrote: > >> Hi! >> I'm trying to mark an occurrence of an event in a particular firm, >> made by a particular director. >> >> The idea is to mark an occurrence with number 1, if there is an >> available record in the same month over the past three consecutive years. >> For instance if for a director A there is a record in 09/2000, >> 09/2001, and >> 09/2002 I would like to assign year 2000 with mark 0, year 2001 with >> mark 0 and 2002 with mark 1. >> >> I have used Excel to solve my problem, and applied a relevant long formula > . >> However, Excel broke down each time I run it. The original file has >> over >> 50000 records. Please find the table below. >> >> Is it possible to create ''mark'' column using STATA software and its >> data management codes? >> Thanks for your help. >> Best wishes, >> Agnieszka >> >> Table: >> full date firm_id director_id mark >> 22/10/2002 1 7 0 >> 07/04/2003 1 7 0 >> 11/04/2003 1 7 0 >> 01/10/2003 1 7 0 >> 01/10/2003 1 7 0 >> 20/10/2003 1 7 0 >> 07/04/2004 1 7 0 >> 08/04/2004 1 7 0 >> 16/04/2004 1 7 0 >> 25/10/2004 1 7 1 >> 12/11/2004 1 7 0 >> 07/04/2005 1 7 1 >> 27/04/2005 1 7 1 >> 25/08/2005 1 7 0 >> 05/09/2005 1 7 0 >> 12/09/2005 1 7 0 >> 25/10/2005 1 7 1 >> 25/04/2006 1 7 1 >> 25/04/2006 1 7 1 >> 05/09/2006 1 7 0 >> 05/09/2006 1 7 0 >> 28/09/2006 1 7 0 >> 28/09/2006 1 7 0 >> 28/09/2006 1 7 0 >> 28/09/2006 1 7 0 >> 28/09/2006 1 7 0 >> 25/10/2006 1 7 1 >> 27/04/2007 1 7 1 >> 27/04/2007 1 7 1 >> 06/09/2007 1 7 1 >> 06/09/2007 1 7 1 >> 06/09/2007 1 7 1 >> 06/09/2007 1 7 1 >> 06/09/2007 1 7 1 >> 26/10/2007 1 7 1 >> 01/11/2007 1 7 0 >> 31/03/2008 1 7 0 >> 31/03/2008 1 7 0 >> 24/04/2008 1 7 1 >> 24/04/2008 1 7 1 >> 03/09/2008 1 7 1 >> 03/09/2008 1 7 1 >> 16/09/2008 1 7 1 >> 14/10/2008 1 7 1 >> 07/04/2009 1 7 1 >> 07/04/2009 1 7 1 >> 07/09/2009 55 7 0 >> 07/09/2009 55 7 0 >> 07/09/2009 55 7 0 >> 23/09/2009 55 7 0 >> 23/09/2009 55 7 0 >> 23/09/2009 1 7 1 >> 08/10/2009 1 7 1 >> 22/03/2010 1 7 0 >> 22/03/2010 1 7 0 >> 23/03/2010 1 7 0 >> 23/03/2010 1 7 0 >> 23/03/2010 1 7 0 >> 12/04/2010 1 7 1 >> 07/09/2010 1 7 1 >> 07/09/2010 1 7 1 >> 07/09/2010 1 7 1 >> 07/09/2010 1 7 1 >> 07/09/2010 1 7 1 >> 10/09/2010 1 7 1 >> 06/10/2010 1 7 1 >> 27/05/2009 2 8 0 >> 09/12/2009 2 8 0 >> 05/02/2010 2 8 0 >> 20/05/2010 2 8 0 >> 23/08/2010 2 8 0 >> 22/04/2008 3 10 0 >> 18/12/2008 3 10 0 >> 24/04/2009 3 10 0 >> 24/04/2009 3 10 0 >> 19/03/2010 3 10 0 >> 19/03/2010 3 10 0 >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/