# Re: st: Matching procedure based on shortest distance given latitudes and longitudes

 From Rüdiger Vollmeier To statalist@hsphsun2.harvard.edu Subject Re: st: Matching procedure based on shortest distance given latitudes and longitudes Date Thu, 9 Feb 2012 21:56:25 +0100

```Thanks for the replies.

2012/2/9 Nick Cox <n.j.cox@durham.ac.uk>:
> As the purpose of each -summarize- is to find the minimum, and no more, each could be done -, meanonly-. As the -summarize-s are repeated, the speed-up may be discernible.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Robert Picard
>
> That works. A better way is to break out of the loop:
>
> forvalues i = 1/`nobs2006' {
>        qui sum d
>        scalar mind = r(min)
>        if mi(mind) continue, break
>        qui sum id1 if d == mind
>        local bestid1 = r(min)
>        qui sum id2 if d == mind
>        local bestid2 = r(min)
>        qui replace matchid = `bestid2' if id1 == `bestid1'
>        qui replace matchd = mind if id1 == `bestid1'
>        qui replace d = . if id1 ==  `bestid1' | id2 ==  `bestid2'
>        dis "id1=" `bestid1' " matched " "id2=" `bestid2' " at d = " mind
> }
>
>>
>> ie.
>>
>> forvalues i = 1/`nobs2006' {
>>        qui sum d
>>                if r(N)!=0  {
>>        scalar mind = r(min)
>>        qui sum id1 if d == mind
>>        local bestid1 = r(min)
>>        qui sum id2 if d == mind
>>        local bestid2 = r(min)
>>        qui replace matchid = `bestid2' if id1 == `bestid1'
>>        qui replace matchd = mind if id1 == `bestid1'
>>        qui replace d = . if id1 ==  `bestid1' | id2 ==  `bestid2'
>>        dis "id1=" `bestid1' " matched " "id2=" `bestid2' " at d = " mind
>>           }
>>           }
>>
>> Am 9. Februar 2012 20:37 schrieb Rüdiger Vollmeier
>>> Thanks to Robert for this smart and elegant way of dealing with this problem.
>>>
>>> However, if there are less observations in 2010 than in 2006,
>>> matchid=1 is generated for all 2006 id=1 observations - even though
>>> there is no shortest distance associated with this id.
>>>
>>> What would be an equally elegant way of solving this problem?
>>>
>>> Ruediger.
>>>
>>>
>>>
>>>
>>> 2012/2/9 Robert Picard <picard@netbox.com>:
>>>> As I mentioned to you a few days ago, you do not need a special
>>>> program to find the nearest neighbors. You can simply use -cross- to
>>>> form all pairwise combination of 2006 and 2010 observations, compute
>>>> all the distances, and then sort. I've added some code that does, I
>>>> think, the matching you describe.
>>>>
>>>> Robert
>>>>
>>>> *----------- begin example -------------
>>>> version 12
>>>>
>>>> set seed 1234
>>>>
>>>> * save 2010 observations separately
>>>> clear
>>>> set obs 10
>>>> gen id2 = _n
>>>> gen lat2 = 40 + runiform() * 5
>>>> gen lon2 = 19 + runiform() * 5
>>>> tempfile y2010
>>>> save "`y2010'"
>>>>
>>>> * create 7 obs for 2006
>>>> clear
>>>> local nobs2006 7
>>>> set obs `nobs2006'
>>>> gen id1 = _n
>>>> gen lat1 = 40 + runiform() * 5
>>>> gen lon1 = 19 + runiform() * 5
>>>>
>>>> * form all pairwise combinations and compute distance
>>>> cross using "`y2010'"
>>>> * user-written program, to install: ssc install geodist
>>>> geodist lat1 lon1 lat2 lon2, gen(d)
>>>>
>>>>
>>>> gen d0 = d
>>>> gen matchid = .
>>>> gen matchd = .
>>>>
>>>> forvalues i = 1/`nobs2006' {
>>>>        qui sum d
>>>>        scalar mind = r(min)
>>>>        qui sum id1 if d == mind
>>>>        local bestid1 = r(min)
>>>>        qui sum id2 if d == mind
>>>>        local bestid2 = r(min)
>>>>        qui replace matchid = `bestid2' if id1 == `bestid1'
>>>>        qui replace matchd = mind if id1 == `bestid1'
>>>>        qui replace d = . if id1 ==  `bestid1' | id2 ==  `bestid2'
>>>>        dis "id1=" `bestid1' " matched " "id2=" `bestid2' " at d = " mind
>>>> }
>>>>
>>>> sort id1 d0 id2
>>>>
>>>> *------------ end example --------------
>>>>
>>>>
>>>>
>>>>> Hello guys,
>>>>>
>>>>> I want to match observations in each observation in a given year with
>>>>> one observation in another year based on the shortest geographical
>>>>> distance between them given the latitudes and longitudes of each
>>>>> observation.
>>>>>
>>>>> I.e. the simplified structure of the dataset looks as follows:
>>>>>
>>>>> id      year       longitude    latitude
>>>>> 1       2006      19.923                40.794
>>>>> 2       2006   19.949           40.711
>>>>> 1       2010      19.940                40.721
>>>>> 2       2010      22.001                50.122
>>>>>
>>>>> Hence, I would like to match each observation in 2006 with the one
>>>>> observation in 2010 that is closest AND that had not been matched to
>>>>> any observation in 2006 before.
>>>>>
>>>>> The previously discussed -nearstat- command (thanks to Wilner!) cannot
>>>>> be applied directly to this problem as it could match the same
>>>>> observation in 2010 with multiple observations in 2006 (i.e. in this
>>>>> example, the year 2010 observation with id 1 is closest to both
>>>>> observations in 2006 - and hence would be matched).
>>>>>
>>>>> Does anybody have an idea for a nice solution or is there even a
>>>>> command out there that would match based on distance given the
>>>>> latitudes and longitudes?
```