Thanks for the confirmation. Two points remain:
1. Each solution has to be extended to multiple files.
2. Which is faster?
Nick
n.j.cox@durham.ac.uk
Carter Rees
> A bit of code (with naming conventions changed from original)
> provided to me
> previously by Maarten Buis later commented upon by Nick Cox.
> My question
> was essentially the same as yours and a small bit of
> modification of this
> code helped immensely. The final format will allow you to
> calculate a mean
> across your gpa variables.
>
> The original conversation can be found here:
> http://www.stata.com/statalist/archive/2006-03/msg00612.html
>
> So, Nick is correct in that there is a -merge- solution.
>
> note: this will create the gpa variables associated with
> each nominated
> friend
> drop _all
> tempfile a
> input frnd gpa
> 99 2.5
> 88 3.1
> 77 4
> 66 1.8
> 55 3.6
> 44 2.9
> end
> sort frnd
> save test, replace
>
> drop _all
> input aid frnd1 frnd2 frnd3 frnd4
> 99 66 77 . .
> 88 77 99 . .
> 77 55 44 99
> 66 88 99 44 77
> 55 44 . . .
> 44 66 . . .
> end
>
>
> reshape long frnd, i(aid)
> drop if frnd ==.
> sort frnd
> merge frnd using test
> drop if _merge == 2
> drop _merge
> reshape wide frnd gpa, i(aid) j(_j)
> list
> save test2, replace
Nick Cox
> There is probably a -merge- solution.
>
> In this case, at worst, a solution is a single
> loop over observations.
>
> gen gpa_f = .
>
> qui forval i = 1/`=_N' {
> /// next line may wrap
> su gpa if inlist(id,`=friend1[`i']', `=friend2[`i']',
> `=friend3[`i']', `=friend4[`i']') , meanonly
> replace gpa_f = r(mean) in `i'
> }
>
> If your ids are string, then you need instead
>
> inlist(id,"`=friend1[`i']'", "`=friend2[`i']'", "`=friend3[`i']'",
> "`=friend4[`i']'")
Chris Ruebeck
> > Suppose my data set has these 6 variables,
> >
> > id : this respondent's ID,
> > gpa : this respondent's GPA, and
> > friend1-4 : the IDs (possibly missing) of this
> > respondent's friends.
> >
> > I would like to create four new variables that record the GPA
> > of each
> > respondent's friends, and then take their average. I have many
> > observations and want to avoid slower methods. Here is my
> code for
> > the first friend.
> >
> > gen gpaf1 = .
> > egen group = group(friend1)
> > summarize group, meanonly
> > foreach num 1 / `r(max)' {
> > summarize id if group==`num', meanonly
> > local idf = r(mean)
> > summarize gpa if id==`idf', meanonly
> > replace gpaf1 = r(mean) if group==`num'
> > }
> >
> > I figure I can nest this in a forvalues loop from 1-4, and
> then use -
> > egen ... rowmean(gpaf1-4)- to get the mean over friends. In
> > the code
> > above, levelsof could replace the -egen ... group(friend1)-
> > but macro
> > length limits would require splitting the friends' ids into two to
> > four groups.
> >
> > Is there a faster method, perhaps with Mata?
> >
> > (An additional wrinkle: some friends may no longer be in the
> > database---so an observation's friend1, for example, may contain a
> > number that is not the id of any observation. I think the
> > code above
> > is robust to that problem, but perhaps this is another potential
> > speed improvement.)
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/