Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: Identifying and recording the first occurrence of an event by actor given category

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: Identifying and recording the first occurrence of an event by actor given category Date Fri, 7 Sep 2012 10:47:31 +0100

```Stata is great at this kind of problem. The essence of Erik's
difficulty is the need to look in other observations for the same
panel to produce the new variable.

First off, the first year anything occurred  is just the minimum year
anything occurred, so we can get at that minimum in several ways:
sorting, using -summarize-, -egen- etc.

Given the panel structure, -egen- is a good tool, because functions
that support a -by()- option or a -by:- prefix will handle panels
separately.

Here is one solution:

egen first_1 = min(year / (event == 1)), by(actor_id)

Here is another:

egen first_1 = min(cond(event == 1, year, .)), by(actor_id)

This approach is discussed in detail within

Cox, N.J. 2011. Speaking Stata: Compared with ... Stata Journal 11(2): 305-314

Abstract.  Many problems in data management center on relating values
to values in other observations, either within a dataset as a whole or
within groups such as panels. This column reviews some basic Stata
summarize, by:, sum(), cond(), and egen. Several techniques exploit
the fact that logical expressions yield 1 when true and 0 when false.
Dividing by zero to yield missings is revealed as a surprisingly
valuable device.

Erik's question appears a bit more complicated than I have answered
here; if there is some twist I have missed no doubt he will make that
clear.

Nick

On Fri, Sep 7, 2012 at 10:07 AM, Erik Aadland <erikaadland@hotmail.com> wrote:

> I have an unbalanced panel dataset.
> This is the structure:
> actor_id    year    category_id    event
> 1           2000    1              .
> 1           2000    2              1
> 1           2001    2              1
> 2           2003    3              .
> 2           2003    2              1
> 2           2004    2              .
>
> I want to generate a variable -first_occurrence- that identifies and records for each actor_id the first time the actor experienced event = 1 if the category = e.g. 2. I would like this -first occurrence- variable to capture the value of -year- at the time of first event occurrence. Some actors never experience event = 1.
> For instance, if I track first occurrence by category_id = 2, this is what I look for:
> actor_id    year    category_id    event    first_occurrence
> 1           2000    1              .        2000
> 1           2000    2              1        2000
> 1           2001    2              1        2000
> 2           2003    3              .        2003
> 2           2003    2              1        2003
> 2           2004    2              .        2003
>
> Any input or suggestions on this problem would be greatly appreciated.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```