Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Friends' characteristics


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Friends' characteristics
Date   Fri, 1 Sep 2006 19:29:58 +0100

I don't know about "the key". Early in the thread 
I offered a solution that is shorter and simpler 
than any based on -merge-. It might be slower, but 
I've yet to hear a report on that. 

Here is the code again. 

gen gpa_f = .

qui forval i = 1/`=_N' {
	/// next line may wrap 
	su gpa if inlist(id,`=friend1[`i']', `=friend2[`i']', `=friend3[`i']', `=friend4[`i']') , meanonly
	replace gpa_f = r(mean) in `i'
}


Nick 
n.j.cox@durham.ac.uk 

Chris Ruebeck
 
> Thanks!  I see the key is using -rename- and -merge- .
 
Stas Kolenikov 

> > I would go with a merge, something like
> >
> > tempfile friend1 friend2 friend3 friend4
> >
> > preserve
> > keep id gpa
> > rename id friend
> > forvalues k=1/4 {
> >   rename friend friend`k'
> >   rename gpa gpa_f`k'
> >   // note that the mask friend will be matched to friend1 when  
> > k==2, etc.
> >  sort id`k'
> >  save `friend`k''
> > }
> > restore
> >
> > forvalues k=1/4 {
> >   sort friend`k'
> >   merge friend`k' using `friend`k''
> > }
> >
> > egen peer_gpa = rmean(gpa_f*)
> >
> > Of course I have not tried it working, but it should give 
> you an idea.
> > I don't know if it is going to be much faster (and it very 
> well might
> > be), but it is also somewhat clearer, I think.
> >
> > On 8/30/06, Chris Ruebeck <ruebeckc@lafayette.edu> wrote:
> >> (Previously sent but didn't see it appear on Statalist.)
> >>
> >> Suppose my data set has these 6 variables,
> >>
> >>         id : this respondent's ID,
> >>         gpa : this respondent's GPA, and
> >>         friend1-4 : the IDs (possibly missing) of this  
> >> respondent's friends.
> >>
> >> I would like to create four new variables that record the 
> GPA of each
> >> respondent's friends, and then take their average.  I have many
> >> observations and want to avoid slower methods.  Here is my code for
> >> the first friend.
> >>
> >> gen gpaf1 = .
> >> egen group = group(friend1)
> >> summarize group, meanonly
> >> foreach num 1 / `r(max)' {
> >>         summarize id if group==`num', meanonly
> >>         local idf = r(mean)
> >>         summarize gpa if id==`idf', meanonly
> >>         replace gpaf1 = r(mean) if group==`num'
> >> }
> >>
> >> I figure I can nest this in a forvalues loop from 1-4, and 
> then use -
> >> egen ... rowmean(gpaf1-4)- to get the mean over friends.  
> In the code
> >> above, levelsof could replace the -egen ... 
> group(friend1)- but macro
> >> length limits would require splitting the friends' ids into two to
> >> four groups.
> >>
> >> Is there a faster method, perhaps with Mata?
> >>
> >> (An additional wrinkle: some friends may no longer be in the
> >> database---so an observation's friend1, for example, may contain a
> >> number that is not the id of any observation.  I think the 
> code above
> >> is robust to that problem, but perhaps this is another potential
> >> speed improvement.)

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index