Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Matching procedure based on shortest distance given latitudes and longitudes


From   Robert Picard <[email protected]>
To   [email protected]
Subject   Re: st: Matching procedure based on shortest distance given latitudes and longitudes
Date   Thu, 9 Feb 2012 15:08:47 -0500

That works. A better way is to break out of the loop:

forvalues i = 1/`nobs2006' {
	qui sum d
	scalar mind = r(min)
	if mi(mind) continue, break
	qui sum id1 if d == mind
	local bestid1 = r(min)
	qui sum id2 if d == mind
	local bestid2 = r(min)
	qui replace matchid = `bestid2' if id1 == `bestid1'
	qui replace matchd = mind if id1 == `bestid1'
	qui replace d = . if id1 ==  `bestid1' | id2 ==  `bestid2'
	dis "id1=" `bestid1' " matched " "id2=" `bestid2' " at d = " mind
}

2012/2/9 Rüdiger Vollmeier <[email protected]>:
> What about an additional if condition?
>
> ie.
>
> forvalues i = 1/`nobs2006' {
>        qui sum d
>                if r(N)!=0  {
>        scalar mind = r(min)
>        qui sum id1 if d == mind
>        local bestid1 = r(min)
>        qui sum id2 if d == mind
>        local bestid2 = r(min)
>        qui replace matchid = `bestid2' if id1 == `bestid1'
>        qui replace matchd = mind if id1 == `bestid1'
>        qui replace d = . if id1 ==  `bestid1' | id2 ==  `bestid2'
>        dis "id1=" `bestid1' " matched " "id2=" `bestid2' " at d = " mind
>           }
>           }
>
> Am 9. Februar 2012 20:37 schrieb Rüdiger Vollmeier
> <[email protected]>:
>> Thanks to Robert for this smart and elegant way of dealing with this problem.
>>
>> However, if there are less observations in 2010 than in 2006,
>> matchid=1 is generated for all 2006 id=1 observations - even though
>> there is no shortest distance associated with this id.
>>
>> What would be an equally elegant way of solving this problem?
>>
>> Ruediger.
>>
>>
>>
>>
>> 2012/2/9 Robert Picard <[email protected]>:
>>> As I mentioned to you a few days ago, you do not need a special
>>> program to find the nearest neighbors. You can simply use -cross- to
>>> form all pairwise combination of 2006 and 2010 observations, compute
>>> all the distances, and then sort. I've added some code that does, I
>>> think, the matching you describe.
>>>
>>> Robert
>>>
>>> *----------- begin example -------------
>>> version 12
>>>
>>> set seed 1234
>>>
>>> * save 2010 observations separately
>>> clear
>>> set obs 10
>>> gen id2 = _n
>>> gen lat2 = 40 + runiform() * 5
>>> gen lon2 = 19 + runiform() * 5
>>> tempfile y2010
>>> save "`y2010'"
>>>
>>> * create 7 obs for 2006
>>> clear
>>> local nobs2006 7
>>> set obs `nobs2006'
>>> gen id1 = _n
>>> gen lat1 = 40 + runiform() * 5
>>> gen lon1 = 19 + runiform() * 5
>>>
>>> * form all pairwise combinations and compute distance
>>> cross using "`y2010'"
>>> * user-written program, to install: ssc install geodist
>>> geodist lat1 lon1 lat2 lon2, gen(d)
>>>
>>>
>>> gen d0 = d
>>> gen matchid = .
>>> gen matchd = .
>>>
>>> forvalues i = 1/`nobs2006' {
>>>        qui sum d
>>>        scalar mind = r(min)
>>>        qui sum id1 if d == mind
>>>        local bestid1 = r(min)
>>>        qui sum id2 if d == mind
>>>        local bestid2 = r(min)
>>>        qui replace matchid = `bestid2' if id1 == `bestid1'
>>>        qui replace matchd = mind if id1 == `bestid1'
>>>        qui replace d = . if id1 ==  `bestid1' | id2 ==  `bestid2'
>>>        dis "id1=" `bestid1' " matched " "id2=" `bestid2' " at d = " mind
>>> }
>>>
>>> sort id1 d0 id2
>>>
>>> *------------ end example --------------
>>>
>>>
>>>
>>> 2012/2/9 Rüdiger Vollmeier <[email protected]>:
>>>> Hello guys,
>>>>
>>>> I want to match observations in each observation in a given year with
>>>> one observation in another year based on the shortest geographical
>>>> distance between them given the latitudes and longitudes of each
>>>> observation.
>>>>
>>>> I.e. the simplified structure of the dataset looks as follows:
>>>>
>>>> id      year       longitude    latitude
>>>> 1       2006      19.923                40.794
>>>> 2       2006   19.949           40.711
>>>> 1       2010      19.940                40.721
>>>> 2       2010      22.001                50.122
>>>>
>>>> Hence, I would like to match each observation in 2006 with the one
>>>> observation in 2010 that is closest AND that had not been matched to
>>>> any observation in 2006 before.
>>>>
>>>> The previously discussed -nearstat- command (thanks to Wilner!) cannot
>>>> be applied directly to this problem as it could match the same
>>>> observation in 2010 with multiple observations in 2006 (i.e. in this
>>>> example, the year 2010 observation with id 1 is closest to both
>>>> observations in 2006 - and hence would be matched).
>>>>
>>>> Does anybody have an idea for a nice solution or is there even a
>>>> command out there that would match based on distance given the
>>>> latitudes and longitudes?
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index