Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: new -geonear- and -geodist- packages available from SSC


From   Robert Picard <picard@netbox.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: new -geonear- and -geodist- packages available from SSC
Date   Sun, 25 Apr 2010 19:30:44 -0400

Thanks to Kit Baum, two new packages called -geonear- and -geodist-
are available on SSC. Stata 9.2 or higher is required. To view the
help files:

. ssc type geonear.hlp
. ssc type geodist.hlp

To install:

. ssc install geonear
. ssc install geodist

-geodist- computes geodetic distances, i.e. the length of the shortest
curve between two points along the surface of a mathematical model of
the earth. By default, -geodist- calculates distances on the WGS 1984
reference ellipsoid. -geodist- can also calculate distances on
user-specified ellipsoids. If execution time is a concern, -geodist-
can calculate great-circle distances instead.

-geonear- is a program to identify the nearest neighbors using
geodetic distances. The current dataset in memory contains what
-geonear- refers to as base locations. The set of neighbor locations
is found in the using dataset. For each base location, -geonear-
identifies the nearest neighbor(s) by calculating distances to all
potential neighbors.

For small problems, this is pretty easy to do but this quickly becomes
inefficient as the number of base and neighbor locations increases. I
wrote -geonear- because I needed a way to find the nearest neighbors
when the set of base and neighbor locations number in the millions. My
solution is to use a divide and conquer approach where base locations
are recursively split into two smaller regions (lat/lon quadrangles).
Each time, the set of potential neighbor locations is also reduced to
those that are within or close to the smaller region. When the number
of base locations within a region falls below a certain threshold,
-geonear- calculates distances to a significantly reduced set of
potential neighbor locations. The neighbor reduction algorithm was
designed to ensure that -geonear- identifies the same nearest
neighbors that would be identified if distances to all neighbors were
computed.

-geonear- must trade off the reduction in the number of computed
distances against the additional overhead of splitting the dataset
into smaller and smaller regions. Extensive testing suggests that in
most cases, no further efficiency gain is obtained by splitting a
region when the number of base locations falls below 50. -geonear-
automatically determines the best threshold to minimize total
execution times.

It should be noted that reducing Stata's memory allocation may
significantly improve the performance of -geonear-, at least with the
current version of Stata/MP 11 on the Mac. Normally, I have 1g
permanently allocated to Stata (set memory 1g, perm) but the first
example in the help file of -geonear- (32760 base and 32760 neighbor
locations) runs about 2.5 times faster under a 10m allocation. I don't
see any differences when I use Stata/SE 9.2.

Robert
http://robertpicard.com/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index