Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: new -geonear- and -geodist- packages available from SSC

From	Robert Picard <[email protected]>
To	[email protected]
Subject	st: new -geonear- and -geodist- packages available from SSC
Date	Sun, 25 Apr 2010 19:30:44 -0400

Thanks to Kit Baum, two new packages called -geonear- and -geodist-
are available on SSC. Stata 9.2 or higher is required. To view the
help files:

. ssc type geonear.hlp
. ssc type geodist.hlp

To install:

. ssc install geonear
. ssc install geodist

-geodist- computes geodetic distances, i.e. the length of the shortest
curve between two points along the surface of a mathematical model of
the earth. By default, -geodist- calculates distances on the WGS 1984
reference ellipsoid. -geodist- can also calculate distances on
user-specified ellipsoids. If execution time is a concern, -geodist-
can calculate great-circle distances instead.

-geonear- is a program to identify the nearest neighbors using
geodetic distances. The current dataset in memory contains what
-geonear- refers to as base locations. The set of neighbor locations
is found in the using dataset. For each base location, -geonear-
identifies the nearest neighbor(s) by calculating distances to all
potential neighbors.

For small problems, this is pretty easy to do but this quickly becomes
inefficient as the number of base and neighbor locations increases. I
wrote -geonear- because I needed a way to find the nearest neighbors
when the set of base and neighbor locations number in the millions. My
solution is to use a divide and conquer approach where base locations
are recursively split into two smaller regions (lat/lon quadrangles).
Each time, the set of potential neighbor locations is also reduced to
those that are within or close to the smaller region. When the number
of base locations within a region falls below a certain threshold,
-geonear- calculates distances to a significantly reduced set of
potential neighbor locations. The neighbor reduction algorithm was
designed to ensure that -geonear- identifies the same nearest
neighbors that would be identified if distances to all neighbors were
computed.

-geonear- must trade off the reduction in the number of computed
distances against the additional overhead of splitting the dataset
into smaller and smaller regions. Extensive testing suggests that in
most cases, no further efficiency gain is obtained by splitting a
region when the number of base locations falls below 50. -geonear-
automatically determines the best threshold to minimize total
execution times.

It should be noted that reducing Stata's memory allocation may
significantly improve the performance of -geonear-, at least with the
current version of Stata/MP 11 on the Mac. Normally, I have 1g
permanently allocated to Stata (set memory 1g, perm) but the first
example in the help file of -geonear- (32760 base and 32760 neighbor
locations) runs about 2.5 times faster under a 10m allocation. I don't
see any differences when I use Stata/SE 9.2.

Robert
http://robertpicard.com/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: Re: st: RE: AW: ratio function
Next by Date: st: overspecification of logit model
Previous by thread: st: to replace just one character of values of a variabl
Next by thread: st: overspecification of logit model
Index(es):
- Date
- Thread