From "Svend Juul" To Subject Re: st: PLEASE HELP ME! Date Sun, 20 May 2007 14:53:32 +0200

```Frank wrote:

Can anyone help me to deal with the following dataset?
I have thought about it for the whole day¡­.

Here is the simplified version of my dataset

ID wage jobtype period1 period2 period3
1. 1 30 1 1 1 0
2. 1 20 2 1 0 1
3. 2 40 1 1 1 0
4. 2 35 1 0 1 1
5. 2 10 2 0 0 1

That is, in this dataset there are N individuals and M periods.
In each period each individual can have either one or more than
one jobs. Each job is accompanied by some wage.

In the real dataset, both N and M are large. But here is one
simplified example with N=2, M=3. period1, period2, period3 are
time dummies. The example says, in the period 1, individual 1
has two jobs (type 1 and type2) and the associated wages are 30
and 20 respectively. In period 2, individual 1 only has one job
which is type 1. In period 3, individual 1 has one job which is
type 2. Similarly, we can read the job and wage information for
individual 2 during these three periods.

My questions is, how to write a code to generate some variables
which contain the following information: (1) the number of job
each individual has in each period; (2) the maximum wage for
each individual in each period. That is, I want to obtain the
following information from the above dataset, where, for example,
(2,30) means in period 1 individual 1 has 2 jobs, and the maximum
wage is 30.

ID period1 period2 period3
1. 1 (2, 30) (1, 30) (1, 20)
2. 2 (1, 40) (2, 40) (2, 35)

--------------------------------------------------------------

Here is a try. This is the testdata:
+---------------------------------------------------+
| id   wage   jobtype   period1   period2   period3 |
|---------------------------------------------------|
1. |  1     30         1         1         1         0 |
2. |  1     20         2         1         0         1 |
3. |  2     40         1         1         1         0 |
4. |  2     35         1         0         1         1 |
5. |  2     10         2         0         0         1 |
+---------------------------------------------------+

. // A long format is easier to work with. -reshape- needs
. // a unique identifier for each id-period combination
. gen id1 = _n
. reshape long period , i(id1) j(per)
.
. // We can drop some observations and id1.
. drop if period==0
. drop id1

. // Here we go.
. // NB! -nvals()- is an unofficial -egenmore- function,
. // and you may need to:
. //    ssc install egenmore
. sort id per
. by id per: egen maxwage=max(wage)
. by id per: egen njobs=nvals(jobtype)
. by id per: keep if _n==1
. keep id per maxwage njobs

. // If you prefer the wide format:
. reshape wide maxwage njobs , i(id) j(per)
. list
+----------------------------------------------------------------+
| id   maxwage1   njobs1   maxwage2   njobs2   maxwage3   njobs3 |
|----------------------------------------------------------------|
1. |  1         30        2         30        1         20        1 |
2. |  2         40        1         40        1         35        2 |
+----------------------------------------------------------------+

Hope this helps
Svend

P.S. Try to give informative subject information, it is much
more likely to create interest among the potiental responders.
does not tell me whether it is something I can help with.

________________________________________________________

Svend Juul
Institut for Folkesundhed, Afdeling for Epidemiologi
(Institute of Public Health, Department of Epidemiology)
Vennelyst Boulevard 6
DK-8000 Aarhus C,  Denmark
Phone, work:  +45 8942 6090
Phone, home:  +45 8693 7796
Fax:          +45 8613 1580
E-mail:       sj@soci.au.dk
_________________________________________________________

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```