Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Robert Picard <picard@netbox.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: new -geonear- and -geodist- packages available from SSC |
Date | Sun, 25 Apr 2010 19:30:44 -0400 |
Thanks to Kit Baum, two new packages called -geonear- and -geodist- are available on SSC. Stata 9.2 or higher is required. To view the help files: . ssc type geonear.hlp . ssc type geodist.hlp To install: . ssc install geonear . ssc install geodist -geodist- computes geodetic distances, i.e. the length of the shortest curve between two points along the surface of a mathematical model of the earth. By default, -geodist- calculates distances on the WGS 1984 reference ellipsoid. -geodist- can also calculate distances on user-specified ellipsoids. If execution time is a concern, -geodist- can calculate great-circle distances instead. -geonear- is a program to identify the nearest neighbors using geodetic distances. The current dataset in memory contains what -geonear- refers to as base locations. The set of neighbor locations is found in the using dataset. For each base location, -geonear- identifies the nearest neighbor(s) by calculating distances to all potential neighbors. For small problems, this is pretty easy to do but this quickly becomes inefficient as the number of base and neighbor locations increases. I wrote -geonear- because I needed a way to find the nearest neighbors when the set of base and neighbor locations number in the millions. My solution is to use a divide and conquer approach where base locations are recursively split into two smaller regions (lat/lon quadrangles). Each time, the set of potential neighbor locations is also reduced to those that are within or close to the smaller region. When the number of base locations within a region falls below a certain threshold, -geonear- calculates distances to a significantly reduced set of potential neighbor locations. The neighbor reduction algorithm was designed to ensure that -geonear- identifies the same nearest neighbors that would be identified if distances to all neighbors were computed. -geonear- must trade off the reduction in the number of computed distances against the additional overhead of splitting the dataset into smaller and smaller regions. Extensive testing suggests that in most cases, no further efficiency gain is obtained by splitting a region when the number of base locations falls below 50. -geonear- automatically determines the best threshold to minimize total execution times. It should be noted that reducing Stata's memory allocation may significantly improve the performance of -geonear-, at least with the current version of Stata/MP 11 on the Mac. Normally, I have 1g permanently allocated to Stata (set memory 1g, perm) but the first example in the help file of -geonear- (32760 base and 32760 neighbor locations) runs about 2.5 times faster under a 10m allocation. I don't see any differences when I use Stata/SE 9.2. Robert http://robertpicard.com/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/