Roy-- We do seem to be in some sort of twilight zone, a realm of asymmetric rules about evidence and civility, but I see no contradiction in what I have posted--I appreciate users posting code on SSC and elsewhere, and in Laura's problem an easy solution is available (via an unmatched merge and a loop over observations) without downloading any code, which is not to say that a solution using a downloadable program would not get there in fewer lines of code (or at least lines of code visible in an email). However, a similar approach to mine (unmatched merge and loop over observations) works for any type of problem of matching one dataset to another, with various different calculations done inside the loop. -vincenty- provides better accuracy, but at a cost (one has to download it, and it is slower than simpler calculations), though the accuracy may in fact be very important for some problems where several neighbors are a similar distance from a point and it is crucial to find the actual nearest neighbor (this is not an issue for Laura, who only wants minimum distance, and can tolerate a fairly large error). My main point about -distmatch- you do not seem to have answered: the help file makes a claim about its relative speed that seems unsupported by the evidence. I have not recommended that people not download it, but I maintain that the help file is inaccurate, and should be redacted. I also recommend you add some guidance for folks looking for a solution to Laura's problem, involving a second dataset, as the examples in the help file don't seem to be transparent to users as they stand, at least on how to approach the two-dataset problem. I maintain that the code below is a simple and elegant solution, using only built-in commands and one reasonably fast call to -merge- (the whole thing might be slightly faster in Mata, but at a cost of lost transparency). The code works just as well if the second file is a polygon file, in which case I would label the variable mindist "Distance to nearest body of water" without mentioning it is the nearest vertex of all polygons to which we are measuring distance; a suitably detailed polygon file will make the distance suitably accurate. use farms, clear local nf=_N g double mindist=. merge using waterbodies local R=6367.44 qui forv i=1/`nf' { local x1=farm_Y[`i'] local y1=farm_X[`i'] local x2 wat_Y local y2 wat_X g double L=(`y2'-`y1')*_pi/180 replace L=(`y2'-`y1'-360)*_pi/180 if L<. & L>_pi replace L=(`y2'-`y1'+360)*_pi/180 if L<-_pi local t1 acos(sin(`x2'*_pi/180)*sin(`x1'*_pi/180) g double d=`t1'+cos(`x2'*_pi/180)*cos(`x1'*_pi/180)*cos(L))*`R' su d, meanonly replace mindist=r(min) in `i' drop L d } drop _m waterbody_ID wat_X wat_Y On Fri, Sep 11, 2009 at 5:47 PM, Roy Wada <roywada@hotmail.com> wrote: > Austin, > > Thanks for your feedback. You seem to be contradicting yourself > on occassions but some people do that now and then. > > If vincenty is critical, then why are you now recommending codes > not based on vincenty? You already know that vincenty makes no > important differences for the distance less than 100 miles. > > Please do make the calculations for us and tell us how this > will impact someone's research. > > I agree -distmatch- can be made to run faster (it should recycle > previous rankings) but not for the reason you posted. > > Your are forgeting to mention that your codes cannot perform ranking > or complete matching. It only looks for the minimum distance. > > This has been pointed out you before. > > I would post another comparison except for the fact that your codes > do not work for other matchings. > > You seem to be creating a moving target with ad hoc fixes, and > suggest other people do the same. If they can do this, why would > they need you? > > Are we stuck in the twilight zone where people does not need help > but in fact should be made to take one when offered. > > There is something funny about people who claim exlusive expertise. > > Let's agree it is a very bad idea to tell other people to not use > someone's else program. > > Roy > > P.S. You can take your download programs to the data center just > like other programss. Just put it in the current directory if you > still do not know how to do this. > > >> Roy-- >> I also have no problem downloading others' work, and my hard drive is >> cluttered with the output of Jann, Baum, Schaffer, Jenkins, Cox, and >> many others. I seem to use one of Ben Jann's programs every day. Roy-- I also have no problem downloading others' work, and my hard drive is cluttered with the output of Jann, Baum, Schaffer, Jenkins, Cox, and many others. I seem to use one of Ben Jann's programs every day. And one of my posted solutions on this topic requires downloading -vincenty- (from SSC), which gives much better distance estimates, though at a substantial time cost.

I am not even claiming that -distmatch- has no utility--no doubt many will find it useful. But I'm afraid I don't see your point in this post at all--you claim in the help file that -distmatch- "take several minutes to complete" for 3000 obs, and other methods take "days if not weeks" yet the method that I have outlined in several posts is entirely general (i.e. it can be customized to produce any range of statistics for any range of neighbors, which no program can claim to do) and runs faster than -distmatch- in many cases, e.g. by a factor of four or five here:

