Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: Calculating the shortest distances between observations (based on longitude and latitude)

 From Rüdiger Vollmeier To statalist@hsphsun2.harvard.edu Subject st: Calculating the shortest distances between observations (based on longitude and latitude) Date Thu, 2 Feb 2012 16:45:09 +0100

```Hello guys,

I want to calculate the shortest distances between observations based
on the coordinates (latitude, longitude). I have adapted a simple
version from N. Cox's nearest neighbor search which was presented here
some time ago. In contrast to that, I want to calulate not only the
shortest but also the second shortest (third, and so on) distances.

Here is a simplified structure of the dataset:

observation_type	country	year	latitude	longitude
1		                Albania	2010	42.07972	19.52361
1		                Albania	2010	42.15028	19.66389
1		                Albania	2010	42.01667	19.48333
2		                Albania	2010	39.95	20.28333
2		                Albania	2010	42.08417	20.42

I want to calculate the smallest distances for a given observation of
observation_type=1 to an observation of type=2 for a given year in a
given country. Here is the code (all variables are generated of the
form gen bank_1_dist_1 =.)

* Shortest distance
local n = _N
forval i = 1/`n' {
forval j = 1/`n' {
if  (`i' != `j') & (observation_type[`i']==1) &
(observation_type[`j']==2) &
(country_number[`i']==country_number[`j']) & (year[`i']==year[`j']) {
local d  = (latitude[`i'] - latitude[`j'])^2 + (longitude[`i'] -
longitude[`j'])^2
replace bank_2010_1_`j'=`d' in `i'
if `d' < bank_1_dist_1[`i'] {
replace bank_1_dist_1 = `d' in `i'
replace bank_1_id_1 = `j' in `i'
}
}
}
}
* Second shortest distance
local n = _N
forval i = 1/`n' {
forval j = 1/`n' {
if  (`i' != `j') &(observation_type[`i']==1)
&(observation_type[`j']==2)
&(country_number[`i']==country_number[`j']) &(year[`i']==year[`j']) {
local d2  = (latitude[`i'] - latitude[`j'])^2 + (longitude[`i'] -
longitude[`j'])^2
if (`d2' > bank_1_dist_1[`i']) & (`d2' < bank_1_dist_2[`i']) {
replace bank_1_dist_2 = `d2' in `i'
replace bank_1_id_2 = `j' in `i'
}
}

}
}

Here is the problem: The shortest distance seems to be well
calculated. However, the second smallest distance is not calculated
correctly (sometimes it takes on the same value as the shortest
distance and only sometimes it is the actual shortest distance). Do
you know why? Do you have any suggestions for improvement?