# Re: st: calculating nearest neighbors; looping back to the beginning of observations

 From "Sarah Cohodes" <[email protected]> To [email protected] Subject Re: st: calculating nearest neighbors; looping back to the beginning of observations Date Wed, 10 Oct 2007 23:45:06 -0400

```Austin,

Many thanks as usual for your guidance (and for the reminder that the
earth is not flat!).  I think the easiest way to make this method work
for me is going to be to create two datasets to facilitate comparing
each observation to every other observation.

As for the first 100: it was an arbitrary designation of
wanted to slog through the matching.  Some sort of weight is more
logical.  I'll investigate your paper for ideas along those lines.

Thanks again,
Sarah

On 10/10/07, Austin Nichols <[email protected]> wrote:
> Sarah --
> Note the problem of hospitals and patients I referenced, though it
> illustrates the idea of looping over obs and calculating distance, is
> not exactly analogous--it involved two datasets, for one. But
> http://www.stata.com/statalist/archive/2007-01/msg00098.html
> is what I should have referenced, in any case.
>
> Also, it occurs to me: why the 100 nearest?  Why not weight by the
> reciprocal of the square of distance over all obs, or somesuch?  For a
> relevant discussion, see Appendix A of
> http://www.nber.org/papers/w13246
>
> On 10/10/07, Austin Nichols <[email protected]> wrote:
> > Sarah--
> > To identify the nearest 100 obs, you will need 100 new variables
> > holding the ID for each of those neighbors; then calculating the
> > additional variables will also be nontrivial.  Far better to calculate
> > whatever you need in a single loop over all observations.  See
> > http://www.stata.com/statalist/archive/2007-01/msg00079.html
> > for more detail.
> >
> > The key is to calculate for each i the distance to all _N-1 not-i obs
> > and then sort by distance and then calculate summary stats on the
> > first 100 obs with an in 1/100 qualification.  Also you might want to
> > calculate distance using a spherical approximation to the Earth's
> > surface (but see -findit vincenty- for an ellipsoidal approximation).
> >
> > On 10/10/07, Sarah Cohodes <[email protected]> wrote:
> > > Dear Statalisters:
> > >
> > > I have the longitude and latitude of each of my observations.  I'd
> > > like to identify the 100 nearest neighbors of each observation, so I
> > > can ultimately calculate some variables based on those nearest
> > > neighbors, for example the average test score of the 100 nearest
> > > neighbors.  I've identified  a strategy to do this, but I'm stuck
> > > along the way.  However, if someone has another suggestion on how to
> > > approach the problem, I'd really appreciate it, especially if it is
> > > less computationally intensive, as I have over 100,000 observations.
> > >
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```