[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Sarah Cohodes" <sarah.cohodes@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: calculating nearest neighbors; looping back to the beginning of observations |

Date |
Wed, 10 Oct 2007 23:45:06 -0400 |

Austin, Many thanks as usual for your guidance (and for the reminder that the earth is not flat!). I think the easiest way to make this method work for me is going to be to create two datasets to facilitate comparing each observation to every other observation. As for the first 100: it was an arbitrary designation of "neighborhood" -- already have thought about weighting, but first wanted to slog through the matching. Some sort of weight is more logical. I'll investigate your paper for ideas along those lines. Thanks again, Sarah On 10/10/07, Austin Nichols <austinnichols@gmail.com> wrote: > Sarah -- > Note the problem of hospitals and patients I referenced, though it > illustrates the idea of looping over obs and calculating distance, is > not exactly analogous--it involved two datasets, for one. But > http://www.stata.com/statalist/archive/2007-01/msg00098.html > is what I should have referenced, in any case. > > Also, it occurs to me: why the 100 nearest? Why not weight by the > reciprocal of the square of distance over all obs, or somesuch? For a > relevant discussion, see Appendix A of > http://www.nber.org/papers/w13246 > > On 10/10/07, Austin Nichols <austinnichols@gmail.com> wrote: > > Sarah-- > > To identify the nearest 100 obs, you will need 100 new variables > > holding the ID for each of those neighbors; then calculating the > > additional variables will also be nontrivial. Far better to calculate > > whatever you need in a single loop over all observations. See > > http://www.stata.com/statalist/archive/2007-01/msg00079.html > > for more detail. > > > > The key is to calculate for each i the distance to all _N-1 not-i obs > > and then sort by distance and then calculate summary stats on the > > first 100 obs with an in 1/100 qualification. Also you might want to > > calculate distance using a spherical approximation to the Earth's > > surface (but see -findit vincenty- for an ellipsoidal approximation). > > > > On 10/10/07, Sarah Cohodes <sarah.cohodes@gmail.com> wrote: > > > Dear Statalisters: > > > > > > I have the longitude and latitude of each of my observations. I'd > > > like to identify the 100 nearest neighbors of each observation, so I > > > can ultimately calculate some variables based on those nearest > > > neighbors, for example the average test score of the 100 nearest > > > neighbors. I've identified a strategy to do this, but I'm stuck > > > along the way. However, if someone has another suggestion on how to > > > approach the problem, I'd really appreciate it, especially if it is > > > less computationally intensive, as I have over 100,000 observations. > > > > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: calculating nearest neighbors; looping back to the beginning of observations***From:*"Sarah Cohodes" <sarah.cohodes@gmail.com>

**Re: st: calculating nearest neighbors; looping back to the beginning of observations***From:*"Austin Nichols" <austinnichols@gmail.com>

**Re: st: calculating nearest neighbors; looping back to the beginning of observations***From:*"Austin Nichols" <austinnichols@gmail.com>

- Prev by Date:
**st: Different results from GLLAMM after each run?** - Next by Date:
**Re: st: Interval regression with instrumental variable** - Previous by thread:
**Re: st: calculating nearest neighbors; looping back to the beginning of observations** - Next by thread:
**Re: st: calculating nearest neighbors; looping back to the beginningof observations** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |