Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Matching procedure based on shortest distance given latitudes and longitudes


From   Rüdiger Vollmeier <ruediger.vollmeier@googlemail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Matching procedure based on shortest distance given latitudes and longitudes
Date   Thu, 9 Feb 2012 20:38:10 +0100

What about an additional if condition?

ie.

forvalues i = 1/`nobs2006' {
       qui sum d
		if r(N)!=0  {
       scalar mind = r(min)
       qui sum id1 if d == mind
       local bestid1 = r(min)
       qui sum id2 if d == mind
       local bestid2 = r(min)
       qui replace matchid = `bestid2' if id1 == `bestid1'
       qui replace matchd = mind if id1 == `bestid1'
       qui replace d = . if id1 ==  `bestid1' | id2 ==  `bestid2'
       dis "id1=" `bestid1' " matched " "id2=" `bestid2' " at d = " mind
	   }
	   }

Am 9. Februar 2012 20:37 schrieb Rüdiger Vollmeier
<ruediger.vollmeier@googlemail.com>:
> Thanks to Robert for this smart and elegant way of dealing with this problem.
>
> However, if there are less observations in 2010 than in 2006,
> matchid=1 is generated for all 2006 id=1 observations - even though
> there is no shortest distance associated with this id.
>
> What would be an equally elegant way of solving this problem?
>
> Ruediger.
>
>
>
>
> 2012/2/9 Robert Picard <picard@netbox.com>:
>> As I mentioned to you a few days ago, you do not need a special
>> program to find the nearest neighbors. You can simply use -cross- to
>> form all pairwise combination of 2006 and 2010 observations, compute
>> all the distances, and then sort. I've added some code that does, I
>> think, the matching you describe.
>>
>> Robert
>>
>> *----------- begin example -------------
>> version 12
>>
>> set seed 1234
>>
>> * save 2010 observations separately
>> clear
>> set obs 10
>> gen id2 = _n
>> gen lat2 = 40 + runiform() * 5
>> gen lon2 = 19 + runiform() * 5
>> tempfile y2010
>> save "`y2010'"
>>
>> * create 7 obs for 2006
>> clear
>> local nobs2006 7
>> set obs `nobs2006'
>> gen id1 = _n
>> gen lat1 = 40 + runiform() * 5
>> gen lon1 = 19 + runiform() * 5
>>
>> * form all pairwise combinations and compute distance
>> cross using "`y2010'"
>> * user-written program, to install: ssc install geodist
>> geodist lat1 lon1 lat2 lon2, gen(d)
>>
>>
>> gen d0 = d
>> gen matchid = .
>> gen matchd = .
>>
>> forvalues i = 1/`nobs2006' {
>>        qui sum d
>>        scalar mind = r(min)
>>        qui sum id1 if d == mind
>>        local bestid1 = r(min)
>>        qui sum id2 if d == mind
>>        local bestid2 = r(min)
>>        qui replace matchid = `bestid2' if id1 == `bestid1'
>>        qui replace matchd = mind if id1 == `bestid1'
>>        qui replace d = . if id1 ==  `bestid1' | id2 ==  `bestid2'
>>        dis "id1=" `bestid1' " matched " "id2=" `bestid2' " at d = " mind
>> }
>>
>> sort id1 d0 id2
>>
>> *------------ end example --------------
>>
>>
>>
>> 2012/2/9 Rüdiger Vollmeier <ruediger.vollmeier@googlemail.com>:
>>> Hello guys,
>>>
>>> I want to match observations in each observation in a given year with
>>> one observation in another year based on the shortest geographical
>>> distance between them given the latitudes and longitudes of each
>>> observation.
>>>
>>> I.e. the simplified structure of the dataset looks as follows:
>>>
>>> id      year       longitude    latitude
>>> 1       2006      19.923                40.794
>>> 2       2006   19.949           40.711
>>> 1       2010      19.940                40.721
>>> 2       2010      22.001                50.122
>>>
>>> Hence, I would like to match each observation in 2006 with the one
>>> observation in 2010 that is closest AND that had not been matched to
>>> any observation in 2006 before.
>>>
>>> The previously discussed -nearstat- command (thanks to Wilner!) cannot
>>> be applied directly to this problem as it could match the same
>>> observation in 2010 with multiple observations in 2006 (i.e. in this
>>> example, the year 2010 observation with id 1 is closest to both
>>> observations in 2006 - and hence would be matched).
>>>
>>> Does anybody have an idea for a nice solution or is there even a
>>> command out there that would match based on distance given the
>>> latitudes and longitudes?
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index