Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Calculating the shortest distances between observations (based on longitude and latitude)

 From Robert Picard To statalist@hsphsun2.harvard.edu Subject Re: st: Calculating the shortest distances between observations (based on longitude and latitude) Date Thu, 2 Feb 2012 11:21:48 -0500

```Take a look at -geonear- and -geodist-, both available from SSC. If
you have only two observation types, then the simplest approach is to
form all pairwise combinations of locations and then calculate the
distances.

*----------- begin example -------------
version 12

clear
input otype str10 country year lat lon
1 Albania 2010 42.07972 19.52361
1 Albania 2010 42.15028 19.66389
1 Albania 2010 42.01667 19.48333
2 Albania 2010 39.95 20.28333
2 Albania 2010 42.08417 20.42
end

* save type1 and type2 observation separately
tempfile main type2
save "`main'"
keep if otype == 2
rename * *2
gen id2 = _n
save "`type2'"
use "`main'"
keep if otype == 1
gen id1 = _n

* form all pairwise combinations and calculate distance
cross using "`type2'"
geodist lat lon lat2 lon2, gen(d)
sort id1 d
*------------ end example --------------

> Hello guys,
>
> I want to calculate the shortest distances between observations based
> on the coordinates (latitude, longitude). I have adapted a simple
> version from N. Cox's nearest neighbor search which was presented here
> some time ago. In contrast to that, I want to calulate not only the
> shortest but also the second shortest (third, and so on) distances.
>
> Here is a simplified structure of the dataset:
>
> observation_type        country year    latitude        longitude
> 1                               Albania 2010    42.07972        19.52361
> 1                               Albania 2010    42.15028        19.66389
> 1                               Albania 2010    42.01667        19.48333
> 2                               Albania 2010    39.95   20.28333
> 2                               Albania 2010    42.08417        20.42
>
> I want to calculate the smallest distances for a given observation of
> observation_type=1 to an observation of type=2 for a given year in a
> given country. Here is the code (all variables are generated of the
> form gen bank_1_dist_1 =.)
>
> * Shortest distance
> local n = _N
>                forval i = 1/`n' {
>                        forval j = 1/`n' {
>                        if  (`i' != `j') & (observation_type[`i']==1) &
> (observation_type[`j']==2) &
> (country_number[`i']==country_number[`j']) & (year[`i']==year[`j']) {
>                        local d  = (latitude[`i'] - latitude[`j'])^2 + (longitude[`i'] -
> longitude[`j'])^2
>                        replace bank_2010_1_`j'=`d' in `i'
>                        if `d' < bank_1_dist_1[`i'] {
>                                                replace bank_1_dist_1 = `d' in `i'
>                                                replace bank_1_id_1 = `j' in `i'
>                                        }
>                        }
>                }
>        }
> * Second shortest distance
> local n = _N
>                forval i = 1/`n' {
>                        forval j = 1/`n' {
>                        if  (`i' != `j') &(observation_type[`i']==1)
> &(observation_type[`j']==2)
> &(country_number[`i']==country_number[`j']) &(year[`i']==year[`j']) {
>                        local d2  = (latitude[`i'] - latitude[`j'])^2 + (longitude[`i'] -
> longitude[`j'])^2
>                        if (`d2' > bank_1_dist_1[`i']) & (`d2' < bank_1_dist_2[`i']) {
>                                                replace bank_1_dist_2 = `d2' in `i'
>                                                replace bank_1_id_2 = `j' in `i'
>                                        }
>                        }
>
>                }
>        }
>
> Here is the problem: The shortest distance seems to be well
> calculated. However, the second smallest distance is not calculated
> correctly (sometimes it takes on the same value as the shortest
> distance and only sometimes it is the actual shortest distance). Do
> you know why? Do you have any suggestions for improvement?
>
> Ruediger
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```