Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: Matching procedure based on shortest distance given latitudes and longitudes

 From Rüdiger Vollmeier To statalist@hsphsun2.harvard.edu Subject Re: st: Matching procedure based on shortest distance given latitudes and longitudes Date Thu, 9 Feb 2012 20:37:34 +0100

```Thanks to Robert for this smart and elegant way of dealing with this problem.

However, if there are less observations in 2010 than in 2006,
matchid=1 is generated for all 2006 id=1 observations - even though
there is no shortest distance associated with this id.

What would be an equally elegant way of solving this problem?

Ruediger.

2012/2/9 Robert Picard <picard@netbox.com>:
> As I mentioned to you a few days ago, you do not need a special
> program to find the nearest neighbors. You can simply use -cross- to
> form all pairwise combination of 2006 and 2010 observations, compute
> all the distances, and then sort. I've added some code that does, I
> think, the matching you describe.
>
> Robert
>
> *----------- begin example -------------
> version 12
>
> set seed 1234
>
> * save 2010 observations separately
> clear
> set obs 10
> gen id2 = _n
> gen lat2 = 40 + runiform() * 5
> gen lon2 = 19 + runiform() * 5
> tempfile y2010
> save "`y2010'"
>
> * create 7 obs for 2006
> clear
> local nobs2006 7
> set obs `nobs2006'
> gen id1 = _n
> gen lat1 = 40 + runiform() * 5
> gen lon1 = 19 + runiform() * 5
>
> * form all pairwise combinations and compute distance
> cross using "`y2010'"
> * user-written program, to install: ssc install geodist
> geodist lat1 lon1 lat2 lon2, gen(d)
>
>
> gen d0 = d
> gen matchid = .
> gen matchd = .
>
> forvalues i = 1/`nobs2006' {
>        qui sum d
>        scalar mind = r(min)
>        qui sum id1 if d == mind
>        local bestid1 = r(min)
>        qui sum id2 if d == mind
>        local bestid2 = r(min)
>        qui replace matchid = `bestid2' if id1 == `bestid1'
>        qui replace matchd = mind if id1 == `bestid1'
>        qui replace d = . if id1 ==  `bestid1' | id2 ==  `bestid2'
>        dis "id1=" `bestid1' " matched " "id2=" `bestid2' " at d = " mind
> }
>
> sort id1 d0 id2
>
> *------------ end example --------------
>
>
>
>> Hello guys,
>>
>> I want to match observations in each observation in a given year with
>> one observation in another year based on the shortest geographical
>> distance between them given the latitudes and longitudes of each
>> observation.
>>
>> I.e. the simplified structure of the dataset looks as follows:
>>
>> id      year       longitude    latitude
>> 1       2006      19.923                40.794
>> 2       2006   19.949           40.711
>> 1       2010      19.940                40.721
>> 2       2010      22.001                50.122
>>
>> Hence, I would like to match each observation in 2006 with the one
>> observation in 2010 that is closest AND that had not been matched to
>> any observation in 2006 before.
>>
>> The previously discussed -nearstat- command (thanks to Wilner!) cannot
>> be applied directly to this problem as it could match the same
>> observation in 2010 with multiple observations in 2006 (i.e. in this
>> example, the year 2010 observation with id 1 is closest to both
>> observations in 2006 - and hence would be matched).
>>
>> Does anybody have an idea for a nice solution or is there even a
>> command out there that would match based on distance given the
>> latitudes and longitudes?
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```