Chris,
A bit of code (with naming conventions changed from original) provided to me
previously by Maarten Buis later commented upon by Nick Cox. My question
was essentially the same as yours and a small bit of modification of this
code helped immensely. The final format will allow you to calculate a mean
across your gpa variables.
The original conversation can be found here:
http://www.stata.com/statalist/archive/2006-03/msg00612.html
So, Nick is correct in that there is a -merge- solution.
note: this will create the gpa variables associated with each nominated
friend
drop _all
tempfile a
input frnd gpa
99 2.5
88 3.1
77 4
66 1.8
55 3.6
44 2.9
end
sort frnd
save test, replace
drop _all
input aid frnd1 frnd2 frnd3 frnd4
99 66 77 . .
88 77 99 . .
77 55 44 99
66 88 99 44 77
55 44 . . .
44 66 . . .
end
reshape long frnd, i(aid)
drop if frnd ==.
sort frnd
merge frnd using test
drop if _merge == 2
drop _merge
reshape wide frnd gpa, i(aid) j(_j)
list
save test2, replace
Carter Rees
School of Criminal Justice
University at Albany, SUNY
-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: Wednesday, August 30, 2006 7:12 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: Friends' characteristics
There is probably a -merge- solution.
In this case, at worst, a solution is a single
loop over observations.
gen gpa_f = .
qui forval i = 1/`=_N' {
/// next line may wrap
su gpa if inlist(id,`=friend1[`i']', `=friend2[`i']',
`=friend3[`i']', `=friend4[`i']') , meanonly
replace gpa_f = r(mean) in `i'
}
If your ids are string, then you need instead
inlist(id,"`=friend1[`i']'", "`=friend2[`i']'", "`=friend3[`i']'",
"`=friend4[`i']'")
Nick
n.j.cox@durham.ac.uk
Chris Ruebeck
> Suppose my data set has these 6 variables,
>
> id : this respondent's ID,
> gpa : this respondent's GPA, and
> friend1-4 : the IDs (possibly missing) of this
> respondent's friends.
>
> I would like to create four new variables that record the GPA
> of each
> respondent's friends, and then take their average. I have many
> observations and want to avoid slower methods. Here is my code for
> the first friend.
>
> gen gpaf1 = .
> egen group = group(friend1)
> summarize group, meanonly
> foreach num 1 / `r(max)' {
> summarize id if group==`num', meanonly
> local idf = r(mean)
> summarize gpa if id==`idf', meanonly
> replace gpaf1 = r(mean) if group==`num'
> }
>
> I figure I can nest this in a forvalues loop from 1-4, and then use -
> egen ... rowmean(gpaf1-4)- to get the mean over friends. In
> the code
> above, levelsof could replace the -egen ... group(friend1)-
> but macro
> length limits would require splitting the friends' ids into two to
> four groups.
>
> Is there a faster method, perhaps with Mata?
>
> (An additional wrinkle: some friends may no longer be in the
> database---so an observation's friend1, for example, may contain a
> number that is not the id of any observation. I think the
> code above
> is robust to that problem, but perhaps this is another potential
> speed improvement.)
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/