From
"Austin Nichols" <austinnichols@gmail.com>

To
statalist@hsphsun2.harvard.edu

Subject
Re: st: calculating nearest neighbors; looping back to the beginning of observations

Date
Wed, 10 Oct 2007 18:34:52 -0400

Sarah -- Note the problem of hospitals and patients I referenced, though it illustrates the idea of looping over obs and calculating distance, is not exactly analogous--it involved two datasets, for one. But http://www.stata.com/statalist/archive/2007-01/msg00098.html is what I should have referenced, in any case. Also, it occurs to me: why the 100 nearest? Why not weight by the reciprocal of the square of distance over all obs, or somesuch? For a relevant discussion, see Appendix A of http://www.nber.org/papers/w13246 On 10/10/07, Austin Nichols <austinnichols@gmail.com> wrote: > Sarah-- > To identify the nearest 100 obs, you will need 100 new variables > holding the ID for each of those neighbors; then calculating the > additional variables will also be nontrivial. Far better to calculate > whatever you need in a single loop over all observations. See > http://www.stata.com/statalist/archive/2007-01/msg00079.html > for more detail. > > The key is to calculate for each i the distance to all _N-1 not-i obs > and then sort by distance and then calculate summary stats on the > first 100 obs with an in 1/100 qualification. Also you might want to > calculate distance using a spherical approximation to the Earth's > surface (but see -findit vincenty- for an ellipsoidal approximation). > > On 10/10/07, Sarah Cohodes <sarah.cohodes@gmail.com> wrote: > > Dear Statalisters: > > > > I have the longitude and latitude of each of my observations. I'd > > like to identify the 100 nearest neighbors of each observation, so I > > can ultimately calculate some variables based on those nearest > > neighbors, for example the average test score of the 100 nearest > > neighbors. I've identified a strategy to do this, but I'm stuck > > along the way. However, if someone has another suggestion on how to > > approach the problem, I'd really appreciate it, especially if it is > > less computationally intensive, as I have over 100,000 observations. > > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

