# Re: st: calculating nearest neighbors; looping back to the beginning of observations

 From "Austin Nichols" To statalist@hsphsun2.harvard.edu Subject Re: st: calculating nearest neighbors; looping back to the beginning of observations Date Wed, 10 Oct 2007 18:34:52 -0400

```Sarah --
Note the problem of hospitals and patients I referenced, though it
illustrates the idea of looping over obs and calculating distance, is
not exactly analogous--it involved two datasets, for one. But
http://www.stata.com/statalist/archive/2007-01/msg00098.html
is what I should have referenced, in any case.

Also, it occurs to me: why the 100 nearest?  Why not weight by the
reciprocal of the square of distance over all obs, or somesuch?  For a
relevant discussion, see Appendix A of
http://www.nber.org/papers/w13246

On 10/10/07, Austin Nichols <austinnichols@gmail.com> wrote:
> Sarah--
> To identify the nearest 100 obs, you will need 100 new variables
> holding the ID for each of those neighbors; then calculating the
> additional variables will also be nontrivial.  Far better to calculate
> whatever you need in a single loop over all observations.  See
> http://www.stata.com/statalist/archive/2007-01/msg00079.html
> for more detail.
>
> The key is to calculate for each i the distance to all _N-1 not-i obs
> and then sort by distance and then calculate summary stats on the
> first 100 obs with an in 1/100 qualification.  Also you might want to
> calculate distance using a spherical approximation to the Earth's
> surface (but see -findit vincenty- for an ellipsoidal approximation).
>
> On 10/10/07, Sarah Cohodes <sarah.cohodes@gmail.com> wrote:
> > Dear Statalisters:
> >
> > I have the longitude and latitude of each of my observations.  I'd
> > like to identify the 100 nearest neighbors of each observation, so I
> > can ultimately calculate some variables based on those nearest
> > neighbors, for example the average test score of the 100 nearest
> > neighbors.  I've identified  a strategy to do this, but I'm stuck
> > along the way.  However, if someone has another suggestion on how to
> > approach the problem, I'd really appreciate it, especially if it is
> > less computationally intensive, as I have over 100,000 observations.
> >
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```