```Svend's solution is good. Lovers of brevity will note
that it can be collapsed to a single line:

bysort pid : egen wave1 = max(wave == 1)

where -total()- will do as well as -max()-.

For a more general discussion, see a posting
on selection of panels on 2 April:

http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.0704/Author/article-30.html

Suppose you have some response -y-. Its
value for wave 1 can be captured thus:

by pid : egen y1 = total(cond(wave == 1, y, .))

Here -max- and -min- would do as well as total.
If -wave- is never 1 in a panel, you just get missings
everywhere in that panel.

Under way is a revision of

How can I generate a variable relating panel data to a reference panel?
http://www.stata.com/support/faqs/stat/panelref.html

which extends the discussion to

How can I generate a variable relating panel data to a reference panel or time?

But that may not be ready for some weeks. However, the Stata logic
is the same, really.

Nick
n.j.cox@durham.ac.uk

Svend Juul


> 1. I have an unbalanced panel and wish to create a variable that
> identifies the presence of wave 1 respondents in subsequent waves in
> order to test for attrition.

> 2. I wish to distribute the value of wave 1 dependent variables in the
> unbalanced panel over subsequent waves.
>
> 	+-----------------+
> 	pid            wave     wave1
> 	-----------------
> 1.	10016872      2          0
> 2.	10016872      4          0
> 	-----------------
> 3.	10017992      1          1
> 4.	10017992      2          1
> 5.	10017992      3          1
> 6.	10017992      4          1
> 7.	10017992      6          1
> 	-----------------
> 	00040404      2          0
> 	-----------------
> 19.	10040439      1          1
> 20.	10040439      2          1

-------------------------------------------------

> This does it:
>
>    recode wave(1=1)(*=0) , generate(wave1)
>    by pid, sort: egen wave2=max(wave1)

```