Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: creating summary variables with overlapping peer groups

From   Nick Cox <>
To   "" <>
Subject   Re: st: creating summary variables with overlapping peer groups
Date   Tue, 30 Jul 2013 10:11:59 +0100

On why the spelling is not "STATA", please do read the Statalist FAQ.
There is no -summary- command; it's called -summarize-.

It's important to note that in the simplest case -- the mean of others
in the same group -- there is no need to loop over observations, as
the FAQ cited does explain.

Whenever there are multiple criteria leading to overlapping groups,
looping over observations is hard to avoid, however. But your code can
surely be speeded up.

gen mean=.

qui forval i=1/`=_N' {
        su variable1 if (Cluster==Cluster[`i'] & _n!=`i') | ///
        Cluster==neighbor1[`i'] | Cluster==neighbor2[`i'], meanonly

        replace mean = r(mean) in `i'

The speed-ups here:

1. -egen- is slow.
2. No need to create new variables.
3. -meanonly- speeds up -summarize-.
4. -in- is always faster than the equivalent -if-. Stata goes straight
to the observation in question, rather than looping over all of them.

By the way, I didn't re-read your original, but check whether you really mean

if _n != `i' & (Cluster==Cluster[`i']  |  Cluster==neighbor1[`i'] |


On 30 July 2013 09:44, Assistant, Research <> wrote:
> Dear Nick,
> thank you for your response. I programmed it the way suggested in the link you sent me. However, this way is just as slow as the one using the summary command. I guess there is no quicker way as STATA needs to run through every single observation. If you have another suggestion I would be grateful for your reply. The code I wrote this morning looks like:
> gen idnum=_n
> gen mean=.
> qui forval i=1/`=_N' {
> gen include=1 if (Cluster==Cluster[`i'] & _n!=`i') | Cluster==neighbor1[`i'] | Cluster==neighbor2[`i']
> egen average=mean(variable1*include)
> replace mean=average if idnum==`i'
> drop include average
> }
> All the best,
> Max
> -----Ursprüngliche Nachricht-----
> Von: [] Im Auftrag von Nick Cox
> Gesendet: Montag, 29. Juli 2013 16:48
> An:
> Betreff: Re: st: creating summary variables with overlapping peer groups
> Start with
> Nick
> On 29 July 2013 14:55, Maximilian Linek
>> I am looking for an efficient way to create a variable for each individual (observation), which contains her group mean without her own value. However, each individual forms part in several groups. The problem is posed in a neighborhood or peer group analysis.
>> My data looks like the following.
>> ID      Cluster         neighbor1       neighbor2       variable1
>> A       1               2               3               1
>> B       1               2               3               0
>> C       2               5               1               1
>> D       2               5               1               1
>> E       3               1               4               1
>> F       4               3               5               0
>> G       5               2               4               0
>> ID is the individual identification of each individual; Cluster is the neighborhood in which an individual lives; neighbor1 is the nearest adjacent neighborhood; neighbor2 is the second most closest adjacent neighborhood; variable1 is the variable I want to generate the mean over for each individual.
>> In this respect individual A is in the same neighborhood with individual B and in an adjacent neighborhood with individuals C, D, and E. The variable I want to generate is the mean of this peer group without the own observation. The value for variable1 is 0 for B and 1 for C, D, and E. That means the mean I would like to generate for individual A is hence 0.75. (The same for individual B would be 1 and so on...)
>> One solution, which unfortunately is very inefficient, is given by:
>> gen mean=.
>> forval i=1/`=_N' {
>>         summarize variable1 if (Cluster==Cluster[`i'] & _n!=`i') | Cluster==neighbor1[`i'] ///
>>         | Cluster==neighbor2[`i'], meanonly
>>         quietly replace mean=r(mean) in `i'
>> }
>> I am looking for an efficient way to do the above.
>> Furthermore, to sophisticate the above analyses I would like to weigh the impact of the own and the adjacent neighborhoods in the calculation. This means e.g. own neighborhood mean (without own observation) enters the summary variable calculation for each individual with a weight of 0.5, neighbor1 mean with a weight of 0.3 and neighbor2 mean with a weight of 0.2.
>> A last extension, which I am interested in, is how the observations entering the calculation of the summary variable can be confined to observations which fall into an age span around an individual's own age. That means: an individual aged 25 shall consider only individuals that are between 22 and 28 as her  peer group and only individuals in the own or adjacent neighborhood which fall into this age span are considered in the calculation of the mean.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index