Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: creating summary variables with overlapping peer groups

From   "Assistant, Research" <>
To   "" <>
Subject   st: creating summary variables with overlapping peer groups
Date   Mon, 29 Jul 2013 13:55:08 +0000

Dear colleagues,

I am looking for an efficient way to create a variable for each individual (observation), which contains her group mean without her own value. However, each individual forms part in several groups. The problem is posed in a neighborhood or peer group analysis.

My data looks like the following.

ID      Cluster         neighbor1       neighbor2       variable1
A       1               2               3               1
B       1               2               3               0
C       2               5               1               1
D       2               5               1               1
E       3               1               4               1
F       4               3               5               0
G       5               2               4               0

ID is the individual identification of each individual; Cluster is the neighborhood in which an individual lives; neighbor1 is the nearest adjacent neighborhood; neighbor2 is the second most closest adjacent neighborhood; variable1 is the variable I want to generate the mean over for each individual.

In this respect individual A is in the same neighborhood with individual B and in an adjacent neighborhood with individuals C, D, and E. The variable I want to generate is the mean of this peer group without the own observation. The value for variable1 is 0 for B and 1 for C, D, and E. That means the mean I would like to generate for individual A is hence 0.75. (The same for individual B would be 1 and so on...)

One solution, which unfortunately is very inefficient, is given by:

gen mean=.
forval i=1/`=_N' {
        summarize variable1 if (Cluster==Cluster[`i'] & _n!=`i') | Cluster==neighbor1[`i’] ///
        | Cluster==neighbor2[`i’], meanonly
        quietly replace mean=r(mean) in `i'

I am looking for an efficient way to do the above.

Furthermore, to sophisticate the above analyses I would like to weigh the impact of the own and the adjacent neighborhoods in the calculation. This means e.g. own neighborhood mean (without own observation) enters the summary variable calculation for each individual with a weight of 0.5, neighbor1 mean with a weight of 0.3 and neighbor2 mean with a weight of 0.2.

A last extension, which I am interested in, is how the observations entering the calculation of the summary variable can be confined to observations which fall into an age span around an individual’s own age. That means: an individual aged 25 shall consider only individuals that are between 22 and 28 as her  peer group and only individuals in the own or adjacent neighborhood which fall into this age span are considered in the calculation of the mean.

I would be very grateful for you suggestions how to solve my problem,

Best regards,

Maximilian Linek
Deutsches Evaluierungsinstitut der Entwicklungszusammenarbeit gGmbH;
Sitz der Gesellschaft Bonn/Registered Office Bonn, Germany;
Registergericht/Registered at Amtsgericht Bonn, Germany; Eintragungs-Nr./Registration no. HRB 19016;
USt-IdNr DE 280688706;
Geschäftsführung/Management:Prof. Dr. Helmut Asche

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index