Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Calculating the shortest distances between observations (based on longitude and latitude)

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: Calculating the shortest distances between observations (based on longitude and latitude) Date Thu, 2 Feb 2012 16:23:21 +0000

This reference to my work is rather elusive as "presented here some
time ago" includes what may be fairly described as several emails.

However, I don't think I ever posted code that is smart about
distances between (latitude, longitude) pairs. If anyone knows that I
am an geographer they might be surprised to learn that I never do
this, but it's true.

More seriously: I want to emphasise, although it's elementary, that
calculating distances using Pythagoras' theorem by treating latitude
and longitude as if they were planar coordinates can be at best only
an approximation within very small areas. It's not a calculation I'd
recommend, not least because there are better routines out there.

Nick

> Hello guys,
>
> I want to calculate the shortest distances between observations based
> on the coordinates (latitude, longitude). I have adapted a simple
> version from N. Cox's nearest neighbor search which was presented here
> some time ago. In contrast to that, I want to calulate not only the
> shortest but also the second shortest (third, and so on) distances.
>
> Here is a simplified structure of the dataset:
>
> observation_type        country year    latitude        longitude
> 1                               Albania 2010    42.07972        19.52361
> 1                               Albania 2010    42.15028        19.66389
> 1                               Albania 2010    42.01667        19.48333
> 2                               Albania 2010    39.95   20.28333
> 2                               Albania 2010    42.08417        20.42
>
> I want to calculate the smallest distances for a given observation of
> observation_type=1 to an observation of type=2 for a given year in a
> given country. Here is the code (all variables are generated of the
> form gen bank_1_dist_1 =.)
>
> * Shortest distance
> local n = _N
>                forval i = 1/`n' {
>                        forval j = 1/`n' {
>                        if  (`i' != `j') & (observation_type[`i']==1) &
> (observation_type[`j']==2) &
> (country_number[`i']==country_number[`j']) & (year[`i']==year[`j']) {
>                        local d  = (latitude[`i'] - latitude[`j'])^2 + (longitude[`i'] -
> longitude[`j'])^2
>                        replace bank_2010_1_`j'=`d' in `i'
>                        if `d' < bank_1_dist_1[`i'] {
>                                                replace bank_1_dist_1 = `d' in `i'
>                                                replace bank_1_id_1 = `j' in `i'
>                                        }
>                        }
>                }
>        }
> * Second shortest distance
> local n = _N
>                forval i = 1/`n' {
>                        forval j = 1/`n' {
>                        if  (`i' != `j') &(observation_type[`i']==1)
> &(observation_type[`j']==2)
> &(country_number[`i']==country_number[`j']) &(year[`i']==year[`j']) {
>                        local d2  = (latitude[`i'] - latitude[`j'])^2 + (longitude[`i'] -
> longitude[`j'])^2
>                        if (`d2' > bank_1_dist_1[`i']) & (`d2' < bank_1_dist_2[`i']) {
>                                                replace bank_1_dist_2 = `d2' in `i'
>                                                replace bank_1_id_2 = `j' in `i'
>                                        }
>                        }
>
>                }
>        }
>
> Here is the problem: The shortest distance seems to be well
> calculated. However, the second smallest distance is not calculated
> correctly (sometimes it takes on the same value as the shortest
> distance and only sometimes it is the actual shortest distance). Do
> you know why? Do you have any suggestions for improvement?
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/