Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Hey Sky <heyskywalker@yahoo.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: how to get the mean value for this panel data? |
Date | Wed, 16 Feb 2011 19:30:00 -0800 (PST) |
Hey David thanks for your answer. I have test the code you suggested and it works perfect for the example data. I will test it tomorrow for the real data and believe it would not have problem. if there has, I will come back and ask for your help. thanks for your kind help. Have a Nice Evening Nan from Montreal ----- Original Message ---- From: David Kantor <kantor.d@att.net> To: statalist@hsphsun2.harvard.edu Sent: Wed, February 16, 2011 10:09:29 PM Subject: Re: st: how to get the mean value for this panel data? At 09:36 PM 2/16/2011, Hey Sky wrote: >Dear Stataers > >I have met a problem to calculate mean value in panel data. example data and >detail of my question as follows. > > >* example data >clear >input id jobid wage1 wage2 wage3 >1 1 1 2 . >1 2 . 3 . >2 1 1 . . >2 2 . 2 2 >3 1 1 2 . >3 2 . 3 . >3 3 . . 4 >4 1 2 4 6 >5 1 2 3 . >6 1 2 . 4 >end > > >the wage1~3 represents the wages in each wave if the person works. >for example, >the id=1 person works in 1st and 2nd wave but not 3rd. he switches job in 2nd >wave thus he has two wage for jobid=1, his first job. surely for >id=2 person, he >has two jobs, switches job in 2nd wave and keeps it till 3rd wave. >for id=4, he >keeps his job for all 3 waves and id=5 lost his only job in 3rd wave. > > >now, I need to get a mean wage for this 3 waves panel data. if they have not >switched job, use this job's wage in each wave to calculate the mean wage; >otherwise, use the new job wage in a new wave. > > >for example,for people who did not switch job, such as id=4, the mean wage >should equal to (2+4+6)/3, which egen x=rowmean(wage1 wage2 wage3) >can do. but >how about id=1? his mean wage should equal to (1+3)/2; or id=2, mean >wage=(1+2+2)/3; and id=3, mean wage=(1+3+4)/3. how to have a code to >deal with >all these situations automatically? > > >I have thousands obs in my real data. thus it is impossible to do it by hand. >thanks for any suggestions. I would first reshape this to long. See -help reshape- reshape long wage , i(id jobid) j(wave) You would then have, say for id=1, id jobid wave wage 1 1 1 1 1 1 2 2 1 1 3 . 1 2 1 . 1 2 2 3 1 2 3 . Get rid of missings: drop if mi(wage) Now, in case there are multiple wage values for the same id and wave, we want to select one preferred value. Presuming that a higher jobid signifies a later time, and therefore, a wage value that supersedes the others, we want to select the latest one: sort id wave jobid (Actually, I would do -assertky id wave jobid-; see -ssc desc assertky-.) Note that wave comes before jobid; we are changing the sort order. by id wave (jobid): keep if _n==_N The dataset is now unique by id and wave, with the preferred values selected. At this point either... by id: egen meanwage = mean(wage) or do a -collapse-. (All suggested code is untested.) HTH --David * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/