[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Austin Nichols" <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: calculating nearest neighbors; looping back to the beginning of observations |

Date |
Wed, 10 Oct 2007 18:21:39 -0400 |

Sarah-- To identify the nearest 100 obs, you will need 100 new variables holding the ID for each of those neighbors; then calculating the additional variables will also be nontrivial. Far better to calculate whatever you need in a single loop over all observations. See http://www.stata.com/statalist/archive/2007-01/msg00079.html for more detail. The key is to calculate for each i the distance to all _N-1 not-i obs and then sort by distance and then calculate summary stats on the first 100 obs with an in 1/100 qualification. Also you might want to calculate distance using a spherical approximation to the Earth's surface (but see -findit vincenty- for an ellipsoidal approximation). On 10/10/07, Sarah Cohodes <sarah.cohodes@gmail.com> wrote: > Dear Statalisters: > > I have the longitude and latitude of each of my observations. I'd > like to identify the 100 nearest neighbors of each observation, so I > can ultimately calculate some variables based on those nearest > neighbors, for example the average test score of the 100 nearest > neighbors. I've identified a strategy to do this, but I'm stuck > along the way. However, if someone has another suggestion on how to > approach the problem, I'd really appreciate it, especially if it is > less computationally intensive, as I have over 100,000 observations. > > Here's my strategy: > 1. determine the distance between i and the next 101 j observations > 2. determine the maximum distance of these 101 distances > 3. replace the max distance with the 101st distance if the 101st > distance is not the largest distance > 4. recalculate the 101st distance with 102nd distance (etc. etc.) and > keep if it is smaller than one of the first 100 distances and toss if > not > > My relevant code so far: > > #delimit; > *make 101 id and distance variables, fill in with first 101 id's and distances; > foreach n of numlist 1/101{; > gen id`n'=.; > gen dist`n'=.; > replace id`n'=id[_n+`n']; > replace dist`n'=sqrt( > ((longitude-longitude[_n+`n'])^2)+((latitude-latitude[_n+`n'])^2)); > *deal with last cases; > }; > > *find the maximum distance; > egen maxdist=rowmax(dist*) > > *replace the max distance and corresponding id with the 101st distance > and id if the 101st distance is less than the max; > foreach n of numlist 1/101{; > replace id`n'=id101 if dist`n'==maxdist; > replace dist`n'=dist101 if dist`n'==maxdist; > }; > > I haven't written the code yet that loops through observations 102 to > _N, because I need to address my issue first. My problem is dealing > with the final 100 observations and testing observations that are > above the current observation -- essentially I want my loop to go > "beyond" _N and return to the first and subsequent observations until > the ith observation within the same loop. If I don't do this, I get > missing data in the last 100 observations, and cannot test the > distance between an observation and an earlier numbered observation. > Suggestions on how to do this? > > Or better yet, suggestions on a better way to approach the issue as a whole? > > Thanks very much. > Sarah * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: calculating nearest neighbors; looping back to the beginning of observations***From:*"Austin Nichols" <austinnichols@gmail.com>

**References**:**st: calculating nearest neighbors; looping back to the beginning of observations***From:*"Sarah Cohodes" <sarah.cohodes@gmail.com>

- Prev by Date:
**st: Prais-Winsten regression: problem with coefficient estimates** - Next by Date:
**Re: st: calculating nearest neighbors; looping back to the beginning of observations** - Previous by thread:
**st: calculating nearest neighbors; looping back to the beginning of observations** - Next by thread:
**Re: st: calculating nearest neighbors; looping back to the beginning of observations** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |