Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Identify 5 closest observations of a variable and then calculate average of another variable based on the observations identified |

Date |
Tue, 18 Sep 2012 10:47:50 -0400 |

Joseph Monte <hmjc66@gmail.com>: The best way to approach this depends on the data size and structure. If you have easy data like below, you can -cross- and compute directly; for a large dataset, you may want to loop over observations (cf. e.g. http://www.stata.com/statalist/archive/2007-10/msg00346.html). To loop over observations and sort repeatedly by distance based on one or more variables, it will behoove you to create a numeric id corresponding to the obs number at the outset, so you can re-sort when you are done with each iteration of the loop, which will make it easy to refer to a specific observation. Something like: clear all input str1 reg v1 v2 A 3.29515 47 A 5.39742 38 A 7.94641 43 A 11.25495 235 A 22.35908 61 A 27.19206 76 A 41.03306 66 A 45.56846 89 A 53.63861 116 A 73.2925 76 A 104.3025 63 A 229.7772 74 A 634.0973 61 A 1053.78 80 A 1163.681 47 B 2.339128 55 B 2.378151 46 B 9.831361 47 B 15.83442 57 B 16.48956 42 B 28.70144 44 B 56.01777 29 B 113.9736 103 B 178.731 47 B 340.715 103 C 0.5892565 44 C 2.016974 37 C 3.041719 76 C 4.009228 80 C 5.856674 51 C 7.587287 188 C 8.827202 66 C 11.53763 48 C 11.67932 152 C 11.86612 51 C 12.95344 84 C 14.85097 63 C 17.12918 47 C 17.74263 67 C 17.97567 75 C 20.60005 84 C 22.13938 44 C 28.99966 44 C 31.23538 55 C 31.52542 36 end g long id=_n g double m=. forv i=1/`=_N' { sort id g d=(v1-v1[`i'])^2 g noti=_n==`i' loc mr=reg[`i'] bys noti reg (d): g f5=(_n<6) if reg=="`mr'"¬i==0 qui count if f5==1 if r(N)==5 { su v2 if f5==1, mean replace m=r(mean) if id==`i' } drop d noti f5 } sort id list, noo On Mon, Sep 17, 2012 at 12:34 PM, Joseph Monte <hmjc66@gmail.com> wrote: > Dear Statalisters, > > The data below shows three variables:- region, var1 and var2. For each > observation in a given region, I want the 5 closest observations based > on var1 (not counting the observation in question). I basically need > the average value of var2 for the 5 observations that are identified. > I don't have any missing values in my data for all three variables > below. I can also confirm that I have a few regions with less than 6 > observations each; hence these regions will be ignored. I am using > Stata 12. > > Thanks, > > Joe * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Identify 5 closest observations of a variable and then calculate average of another variable based on the observations identified***From:*Joseph Monte <hmjc66@gmail.com>

**References**:**st: Identify 5 closest observations of a variable and then calculate average of another variable based on the observations identified***From:*Joseph Monte <hmjc66@gmail.com>

- Prev by Date:
**Re: st: nl: missing standar error** - Next by Date:
**st: subinstr extended macro function limit** - Previous by thread:
**st: Identify 5 closest observations of a variable and then calculate average of another variable based on the observations identified** - Next by thread:
**Re: st: Identify 5 closest observations of a variable and then calculate average of another variable based on the observations identified** - Index(es):