Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: how to get the mean value for this panel data?

From   David Kantor <>
Subject   Re: st: how to get the mean value for this panel data?
Date   Wed, 16 Feb 2011 22:09:29 -0500

At 09:36 PM 2/16/2011, Hey Sky wrote:
Dear Stataers

I have met a problem to calculate mean value in panel data. example data and
detail of my question as follows.

* example data
input id jobid wage1 wage2 wage3
1 1 1 2 .
1 2 . 3 .
2 1 1 . .
2 2 . 2 2
3 1 1 2 .
3 2 . 3 .
3 3 . . 4
4 1 2 4 6
5 1 2 3 .
6 1 2 . 4

the wage1~3 represents the wages in each wave if the person works. for example,
the id=1 person works in 1st and 2nd wave but not 3rd. he switches job in 2nd
wave thus he has two wage for jobid=1, his first job. surely for id=2 person, he has two jobs, switches job in 2nd wave and keeps it till 3rd wave. for id=4, he
keeps his job for all 3 waves and id=5 lost his only job in 3rd wave.

now, I need to get a mean wage for this 3 waves panel data. if they have not
switched job, use this job's wage in each wave to calculate the mean wage;
otherwise, use the new job wage in a new wave.

for example,for people who did not switch job, such as id=4, the mean wage
should equal to (2+4+6)/3, which egen x=rowmean(wage1 wage2 wage3) can do. but
how about id=1? his mean wage should equal to (1+3)/2; or id=2, mean
wage=(1+2+2)/3; and id=3, mean wage=(1+3+4)/3. how to have a code to deal with
all these situations automatically?

I have thousands obs in my real data. thus it is impossible to do it by hand.
thanks  for any suggestions.

I would first reshape this to long. See -help reshape-
reshape long wage , i(id jobid) j(wave)
You would then have, say for id=1,
id jobid wave wage
1  1  1  1
1  1  2  2
1  1  3  .
1  2  1  .
1  2  2  3
1  2  3  .

Get rid of missings:
drop if mi(wage)

Now, in case there are multiple wage values for the same id and wave, we want to select one preferred value. Presuming that a higher jobid signifies a later time, and therefore, a wage value that supersedes the others, we want to select the latest one:
sort id wave jobid
(Actually, I would do -assertky id wave jobid-; see -ssc desc assertky-.)
Note that wave comes before jobid; we are changing the sort order.
by id wave (jobid): keep if _n==_N

The dataset is now unique by id and wave, with the preferred values selected.

At this point either...
 by id: egen meanwage = mean(wage)
or do a -collapse-.

(All suggested code is untested.)

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index