Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: exact command for distance ?


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: exact command for distance ?
Date   Sat, 12 Sep 2009 07:48:07 -0400

Roy--
We do seem to be in some sort of twilight zone, a realm of asymmetric
rules about evidence and civility, but I see no contradiction in what
I have posted--I appreciate users posting code on SSC and elsewhere,
and in Laura's problem an easy solution is available (via an unmatched
merge and a loop over observations) without downloading any code,
which is not to say that a solution using a downloadable program would
not get there in fewer lines of code (or at least lines of code
visible in an email).  However, a similar approach to mine (unmatched
merge and loop over observations) works for any type of problem of
matching one dataset to another, with various different calculations
done inside the loop.  -vincenty- provides better accuracy, but at a
cost (one has to download it, and it is slower than simpler
calculations), though the accuracy may in fact be very important for
some problems where several neighbors are a similar distance from a
point and it is crucial to find the actual nearest neighbor (this is
not an issue for Laura, who only wants minimum distance, and can
tolerate a fairly large error).

My main point about -distmatch- you do not seem to have answered: the
help file makes a claim about its relative speed that seems
unsupported by the evidence.  I have not recommended that people not
download it, but I maintain that the help file is inaccurate, and
should be redacted.  I also recommend you add some guidance for folks
looking for a solution to Laura's problem, involving a second dataset,
as the examples in the help file don't seem to be transparent to users
as they stand, at least on how to approach the two-dataset problem.

I maintain that the code below is a simple and elegant solution, using
only built-in commands and one reasonably fast call to -merge- (the
whole thing might be slightly faster in Mata, but at a cost of lost
transparency). The code works just as well if the second file is a
polygon file, in which case I would label the variable mindist
"Distance to nearest body of water" without mentioning it is the
nearest vertex of all polygons to which we are measuring distance; a
suitably detailed polygon file will make the distance suitably
accurate.

use farms, clear
local nf=_N
g double mindist=.
merge using waterbodies
local R=6367.44
qui forv i=1/`nf' {
local x1=farm_Y[`i']
local y1=farm_X[`i']
local x2 wat_Y
local y2 wat_X
g double L=(`y2'-`y1')*_pi/180
replace L=(`y2'-`y1'-360)*_pi/180 if L<. & L>_pi
replace L=(`y2'-`y1'+360)*_pi/180 if L<-_pi
local t1 acos(sin(`x2'*_pi/180)*sin(`x1'*_pi/180)
g double d=`t1'+cos(`x2'*_pi/180)*cos(`x1'*_pi/180)*cos(L))*`R'
su d, meanonly
replace mindist=r(min) in `i'
drop L d
}
drop _m waterbody_ID wat_X wat_Y


On Fri, Sep 11, 2009 at 5:47 PM, Roy Wada <roywada@hotmail.com> wrote:
> Austin,
>
> Thanks for your feedback. You seem to be contradicting yourself
> on occassions but some people do that now and then.
>
> If vincenty is critical, then why are you now recommending codes
> not based on vincenty? You already know that vincenty makes no
> important differences for the distance less than 100 miles.
>
> Please do make the calculations for us and tell us how this
> will impact someone's research.
>
> I agree -distmatch- can be made to run faster (it should recycle
> previous rankings) but not for the reason you posted.
>
> Your are forgeting to mention that your codes cannot perform ranking
> or complete matching. It only looks for the minimum distance.
>
> This has been pointed out you before.
>
> I would post another comparison except for the fact that your codes
> do not work for other matchings.
>
> You seem to be creating a moving target with ad hoc fixes, and
> suggest other people do the same. If they can do this, why would
> they need you?
>
> Are we stuck in the twilight zone where people does not need help
> but in fact should be made to take one when offered.
>
> There is something funny about people who claim exlusive expertise.
>
> Let's agree it is a very bad idea to tell other people to not use
> someone's else program.
>
> Roy
>
> P.S. You can take your download programs to the data center just
> like other programss. Just put it in the current directory if you
> still do not know how to do this.
>
>
>> Roy--
>> I also have no problem downloading others' work, and my hard drive is
>> cluttered with the output of Jann, Baum, Schaffer, Jenkins, Cox, and
>> many others. I seem to use one of Ben Jann's programs every day. And
>> one of my posted solutions on this topic requires downloading
>> -vincenty- (from SSC), which gives much better distance estimates,
>> though at a substantial time cost.
>>
>> I am not even claiming that -distmatch- has no utility--no doubt many
>> will find it useful. But I'm afraid I don't see your point in this
>> post at all--you claim in the help file that -distmatch- "take several
>> minutes to complete" for 3000 obs, and other methods take "days if not
>> weeks" yet the method that I have outlined in several posts is
>> entirely general (i.e. it can be customized to produce any range of
>> statistics for any range of neighbors, which no program can claim to
>> do) and runs faster than -distmatch- in many cases, e.g. by a factor
>> of four or five here:
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index