Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: calculating nearest neighbors; looping back to the beginning of observations


From   "Austin Nichols" <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: calculating nearest neighbors; looping back to the beginning of observations
Date   Wed, 10 Oct 2007 18:34:52 -0400

Sarah --
Note the problem of hospitals and patients I referenced, though it
illustrates the idea of looping over obs and calculating distance, is
not exactly analogous--it involved two datasets, for one. But
http://www.stata.com/statalist/archive/2007-01/msg00098.html
is what I should have referenced, in any case.

Also, it occurs to me: why the 100 nearest?  Why not weight by the
reciprocal of the square of distance over all obs, or somesuch?  For a
relevant discussion, see Appendix A of
http://www.nber.org/papers/w13246

On 10/10/07, Austin Nichols <austinnichols@gmail.com> wrote:
> Sarah--
> To identify the nearest 100 obs, you will need 100 new variables
> holding the ID for each of those neighbors; then calculating the
> additional variables will also be nontrivial.  Far better to calculate
> whatever you need in a single loop over all observations.  See
> http://www.stata.com/statalist/archive/2007-01/msg00079.html
> for more detail.
>
> The key is to calculate for each i the distance to all _N-1 not-i obs
> and then sort by distance and then calculate summary stats on the
> first 100 obs with an in 1/100 qualification.  Also you might want to
> calculate distance using a spherical approximation to the Earth's
> surface (but see -findit vincenty- for an ellipsoidal approximation).
>
> On 10/10/07, Sarah Cohodes <sarah.cohodes@gmail.com> wrote:
> > Dear Statalisters:
> >
> > I have the longitude and latitude of each of my observations.  I'd
> > like to identify the 100 nearest neighbors of each observation, so I
> > can ultimately calculate some variables based on those nearest
> > neighbors, for example the average test score of the 100 nearest
> > neighbors.  I've identified  a strategy to do this, but I'm stuck
> > along the way.  However, if someone has another suggestion on how to
> > approach the problem, I'd really appreciate it, especially if it is
> > less computationally intensive, as I have over 100,000 observations.
> >
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index