Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Matching procedure based on shortest distance given latitudes and longitudes

From	Robert Picard <[email protected]>
To	[email protected]
Subject	Re: st: Matching procedure based on shortest distance given latitudes and longitudes
Date	Thu, 9 Feb 2012 12:10:02 -0500

As I mentioned to you a few days ago, you do not need a special
program to find the nearest neighbors. You can simply use -cross- to
form all pairwise combination of 2006 and 2010 observations, compute
all the distances, and then sort. I've added some code that does, I
think, the matching you describe.

Robert

*----------- begin example -------------
version 12

set seed 1234

* save 2010 observations separately
clear
set obs 10
gen id2 = _n
gen lat2 = 40 + runiform() * 5
gen lon2 = 19 + runiform() * 5
tempfile y2010
save "`y2010'"

* create 7 obs for 2006
clear
local nobs2006 7
set obs `nobs2006'
gen id1 = _n
gen lat1 = 40 + runiform() * 5
gen lon1 = 19 + runiform() * 5

* form all pairwise combinations and compute distance
cross using "`y2010'"
* user-written program, to install: ssc install geodist
geodist lat1 lon1 lat2 lon2, gen(d)


gen d0 = d
gen matchid = .
gen matchd = .

forvalues i = 1/`nobs2006' {
	qui sum d
	scalar mind = r(min)
	qui sum id1 if d == mind
	local bestid1 = r(min)
	qui sum id2 if d == mind
	local bestid2 = r(min)
	qui replace matchid = `bestid2' if id1 == `bestid1'
	qui replace matchd = mind if id1 == `bestid1'
	qui replace d = . if id1 ==  `bestid1' | id2 ==  `bestid2'
	dis "id1=" `bestid1' " matched " "id2=" `bestid2' " at d = " mind
}

sort id1 d0 id2

*------------ end example --------------



2012/2/9 Rüdiger Vollmeier <[email protected]>:
> Hello guys,
>
> I want to match observations in each observation in a given year with
> one observation in another year based on the shortest geographical
> distance between them given the latitudes and longitudes of each
> observation.
>
> I.e. the simplified structure of the dataset looks as follows:
>
> id      year       longitude    latitude
> 1       2006      19.923                40.794
> 2       2006   19.949           40.711
> 1       2010      19.940                40.721
> 2       2010      22.001                50.122
>
> Hence, I would like to match each observation in 2006 with the one
> observation in 2010 that is closest AND that had not been matched to
> any observation in 2006 before.
>
> The previously discussed -nearstat- command (thanks to Wilner!) cannot
> be applied directly to this problem as it could match the same
> observation in 2010 with multiple observations in 2006 (i.e. in this
> example, the year 2010 observation with id 1 is closest to both
> observations in 2006 - and hence would be matched).
>
> Does anybody have an idea for a nice solution or is there even a
> command out there that would match based on distance given the
> latitudes and longitudes?
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Matching procedure based on shortest distance given latitudes and longitudes
  - From: Rüdiger Vollmeier <[email protected]>

References:
- st: Matching procedure based on shortest distance given latitudes and longitudes
  - From: Rüdiger Vollmeier <[email protected]>

Prev by Date: Re: st: how can I merge two data sets over a range of values
Next by Date: st: RE: time varying covariate Cox regression
Previous by thread: Re: st: Matching procedure based on shortest distance given latitudes and longitudes
Next by thread: Re: st: Matching procedure based on shortest distance given latitudes and longitudes
Index(es):
- Date
- Thread