Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Robert Picard <picard@netbox.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Matching procedure based on shortest distance given latitudes and longitudes |

Date |
Thu, 9 Feb 2012 12:10:02 -0500 |

As I mentioned to you a few days ago, you do not need a special program to find the nearest neighbors. You can simply use -cross- to form all pairwise combination of 2006 and 2010 observations, compute all the distances, and then sort. I've added some code that does, I think, the matching you describe. Robert *----------- begin example ------------- version 12 set seed 1234 * save 2010 observations separately clear set obs 10 gen id2 = _n gen lat2 = 40 + runiform() * 5 gen lon2 = 19 + runiform() * 5 tempfile y2010 save "`y2010'" * create 7 obs for 2006 clear local nobs2006 7 set obs `nobs2006' gen id1 = _n gen lat1 = 40 + runiform() * 5 gen lon1 = 19 + runiform() * 5 * form all pairwise combinations and compute distance cross using "`y2010'" * user-written program, to install: ssc install geodist geodist lat1 lon1 lat2 lon2, gen(d) gen d0 = d gen matchid = . gen matchd = . forvalues i = 1/`nobs2006' { qui sum d scalar mind = r(min) qui sum id1 if d == mind local bestid1 = r(min) qui sum id2 if d == mind local bestid2 = r(min) qui replace matchid = `bestid2' if id1 == `bestid1' qui replace matchd = mind if id1 == `bestid1' qui replace d = . if id1 == `bestid1' | id2 == `bestid2' dis "id1=" `bestid1' " matched " "id2=" `bestid2' " at d = " mind } sort id1 d0 id2 *------------ end example -------------- 2012/2/9 Rüdiger Vollmeier <ruediger.vollmeier@googlemail.com>: > Hello guys, > > I want to match observations in each observation in a given year with > one observation in another year based on the shortest geographical > distance between them given the latitudes and longitudes of each > observation. > > I.e. the simplified structure of the dataset looks as follows: > > id year longitude latitude > 1 2006 19.923 40.794 > 2 2006 19.949 40.711 > 1 2010 19.940 40.721 > 2 2010 22.001 50.122 > > Hence, I would like to match each observation in 2006 with the one > observation in 2010 that is closest AND that had not been matched to > any observation in 2006 before. > > The previously discussed -nearstat- command (thanks to Wilner!) cannot > be applied directly to this problem as it could match the same > observation in 2010 with multiple observations in 2006 (i.e. in this > example, the year 2010 observation with id 1 is closest to both > observations in 2006 - and hence would be matched). > > Does anybody have an idea for a nice solution or is there even a > command out there that would match based on distance given the > latitudes and longitudes? > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Matching procedure based on shortest distance given latitudes and longitudes***From:*Rüdiger Vollmeier <ruediger.vollmeier@googlemail.com>

**References**:**st: Matching procedure based on shortest distance given latitudes and longitudes***From:*Rüdiger Vollmeier <ruediger.vollmeier@googlemail.com>

- Prev by Date:
**Re: st: how can I merge two data sets over a range of values** - Next by Date:
**st: RE: time varying covariate Cox regression** - Previous by thread:
**Re: st: Matching procedure based on shortest distance given latitudes and longitudes** - Next by thread:
**Re: st: Matching procedure based on shortest distance given latitudes and longitudes** - Index(es):