Erik Aadland <erikaadland@hotmail.com>

<statalist@hsphsun2.harvard.edu>

RE: st: Identifying and recording the first occurrence of an event by actor given category

Fri, 7 Sep 2012 11:36:10 +0000

Thank you, Nick, for your solution and for the interesting reference! To make the first occurrence conditional on specific category values (e.g. 2), I modified the code as follows: egen first_1 = min(year / (event == 1 & category_id == 2)), by (actor_id) This modification appears to work well, too. Kind regards, Erik. > Date: Fri, 7 Sep 2012 10:47:31 +0100 > Subject: Re: st: Identifying and recording the first occurrence of an event by actor given category > From: njcoxstata@gmail.com > To: statalist@hsphsun2.harvard.edu > > Stata is great at this kind of problem. The essence of Erik's > difficulty is the need to look in other observations for the same > panel to produce the new variable. > > First off, the first year anything occurred is just the minimum year > anything occurred, so we can get at that minimum in several ways: > sorting, using -summarize-, -egen- etc. > > Given the panel structure, -egen- is a good tool, because functions > that support a -by()- option or a -by:- prefix will handle panels > separately. > > Here is one solution: > > egen first_1 = min(year / (event == 1)), by(actor_id) > > Here is another: > > egen first_1 = min(cond(event == 1, year, .)), by(actor_id) > > This approach is discussed in detail within > > Cox, N.J. 2011. Speaking Stata: Compared with ... Stata Journal 11(2): 305-314 > > Abstract. Many problems in data management center on relating values > to values in other observations, either within a dataset as a whole or > within groups such as panels. This column reviews some basic Stata > techniques helpful for such tasks, including the use of subscripts, > summarize, by:, sum(), cond(), and egen. Several techniques exploit > the fact that logical expressions yield 1 when true and 0 when false. > Dividing by zero to yield missings is revealed as a surprisingly > valuable device. > > Erik's question appears a bit more complicated than I have answered > here; if there is some twist I have missed no doubt he will make that > clear. > > Nick > > On Fri, Sep 7, 2012 at 10:07 AM, Erik Aadland <erikaadland@hotmail.com> wrote: > > > I have an unbalanced panel dataset. > > This is the structure: > > actor_id year category_id event > > 1 2000 1 . > > 1 2000 2 1 > > 1 2001 2 1 > > 2 2003 3 . > > 2 2003 2 1 > > 2 2004 2 . > > > > I want to generate a variable -first_occurrence- that identifies and records for each actor_id the first time the actor experienced event = 1 if the category = e.g. 2. I would like this -first occurrence- variable to capture the value of -year- at the time of first event occurrence. Some actors never experience event = 1. > > For instance, if I track first occurrence by category_id = 2, this is what I look for: > > actor_id year category_id event first_occurrence > > 1 2000 1 . 2000 > > 1 2000 2 1 2000 > > 1 2001 2 1 2000 > > 2 2003 3 . 2003 > > 2 2003 2 1 2003 > > 2 2004 2 . 2003 > > > > Any input or suggestions on this problem would be greatly appreciated. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

