Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: calculating nearest neighbors; looping back to the beginning of observations


From   "Sarah Cohodes" <sarah.cohodes@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: calculating nearest neighbors; looping back to the beginning of observations
Date   Wed, 10 Oct 2007 23:45:06 -0400

Austin,

Many thanks as usual for your guidance (and for the reminder that the
earth is not flat!).  I think the easiest way to make this method work
for me is going to be to create two datasets to facilitate comparing
each observation to every other observation.

As for the first 100: it was an arbitrary designation of
"neighborhood" -- already have thought about weighting, but first
wanted to slog through the matching.  Some sort of weight is more
logical.  I'll investigate your paper for ideas along those lines.

Thanks again,
Sarah

On 10/10/07, Austin Nichols <austinnichols@gmail.com> wrote:
> Sarah --
> Note the problem of hospitals and patients I referenced, though it
> illustrates the idea of looping over obs and calculating distance, is
> not exactly analogous--it involved two datasets, for one. But
> http://www.stata.com/statalist/archive/2007-01/msg00098.html
> is what I should have referenced, in any case.
>
> Also, it occurs to me: why the 100 nearest?  Why not weight by the
> reciprocal of the square of distance over all obs, or somesuch?  For a
> relevant discussion, see Appendix A of
> http://www.nber.org/papers/w13246
>
> On 10/10/07, Austin Nichols <austinnichols@gmail.com> wrote:
> > Sarah--
> > To identify the nearest 100 obs, you will need 100 new variables
> > holding the ID for each of those neighbors; then calculating the
> > additional variables will also be nontrivial.  Far better to calculate
> > whatever you need in a single loop over all observations.  See
> > http://www.stata.com/statalist/archive/2007-01/msg00079.html
> > for more detail.
> >
> > The key is to calculate for each i the distance to all _N-1 not-i obs
> > and then sort by distance and then calculate summary stats on the
> > first 100 obs with an in 1/100 qualification.  Also you might want to
> > calculate distance using a spherical approximation to the Earth's
> > surface (but see -findit vincenty- for an ellipsoidal approximation).
> >
> > On 10/10/07, Sarah Cohodes <sarah.cohodes@gmail.com> wrote:
> > > Dear Statalisters:
> > >
> > > I have the longitude and latitude of each of my observations.  I'd
> > > like to identify the 100 nearest neighbors of each observation, so I
> > > can ultimately calculate some variables based on those nearest
> > > neighbors, for example the average test score of the 100 nearest
> > > neighbors.  I've identified  a strategy to do this, but I'm stuck
> > > along the way.  However, if someone has another suggestion on how to
> > > approach the problem, I'd really appreciate it, especially if it is
> > > less computationally intensive, as I have over 100,000 observations.
> > >
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index