[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Sarah Cohodes" <sarah.cohodes@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: calculating nearest neighbors; looping back to the beginning of observations |

Date |
Wed, 10 Oct 2007 17:39:14 -0400 |

Dear Statalisters: I have the longitude and latitude of each of my observations. I'd like to identify the 100 nearest neighbors of each observation, so I can ultimately calculate some variables based on those nearest neighbors, for example the average test score of the 100 nearest neighbors. I've identified a strategy to do this, but I'm stuck along the way. However, if someone has another suggestion on how to approach the problem, I'd really appreciate it, especially if it is less computationally intensive, as I have over 100,000 observations. Here's my strategy: 1. determine the distance between i and the next 101 j observations 2. determine the maximum distance of these 101 distances 3. replace the max distance with the 101st distance if the 101st distance is not the largest distance 4. recalculate the 101st distance with 102nd distance (etc. etc.) and keep if it is smaller than one of the first 100 distances and toss if not My relevant code so far: #delimit; *make 101 id and distance variables, fill in with first 101 id's and distances; foreach n of numlist 1/101{; gen id`n'=.; gen dist`n'=.; replace id`n'=id[_n+`n']; replace dist`n'=sqrt( ((longitude-longitude[_n+`n'])^2)+((latitude-latitude[_n+`n'])^2)); *deal with last cases; }; *find the maximum distance; egen maxdist=rowmax(dist*) *replace the max distance and corresponding id with the 101st distance and id if the 101st distance is less than the max; foreach n of numlist 1/101{; replace id`n'=id101 if dist`n'==maxdist; replace dist`n'=dist101 if dist`n'==maxdist; }; I haven't written the code yet that loops through observations 102 to _N, because I need to address my issue first. My problem is dealing with the final 100 observations and testing observations that are above the current observation -- essentially I want my loop to go "beyond" _N and return to the first and subsequent observations until the ith observation within the same loop. If I don't do this, I get missing data in the last 100 observations, and cannot test the distance between an observation and an earlier numbered observation. Suggestions on how to do this? Or better yet, suggestions on a better way to approach the issue as a whole? Thanks very much. Sarah ******************************** Sarah Cohodes Project for Policy Innovation in Education Harvard Graduate School of Education 617.496.3408 (phone) 617.495.2614 (fax) * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: calculating nearest neighbors; looping back to the beginningof observations***From:*"David M. Drukker" <ddrukker@stata.com>

**Re: st: calculating nearest neighbors; looping back to the beginning of observations***From:*"Austin Nichols" <austinnichols@gmail.com>

- Prev by Date:
**st: RE: re: missing dummy variable** - Next by Date:
**st: Prais-Winsten regression: problem with coefficient estimates** - Previous by thread:
**st: re: Sorting string variable based on a numeric variable** - Next by thread:
**Re: st: calculating nearest neighbors; looping back to the beginning of observations** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |