Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: exact command for distance ?


From   Roy Wada <roywada@hotmail.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: exact command for distance ?
Date   Fri, 11 Sep 2009 09:29:44 -0700

I have no problem downloading and using programs written by other people. I 
have not used one of Austin's programs, but it is nice to know that it's there 
if I need it. Could I write a program on regression discontinuity from scatch? 
Sure. Give me two hours. But why would I? I am very grateful to those who have 
shared their program with me, and I hope they find my programs useful too.

When a topic repeatedly come up on this list, it indicates an unaddressed 
problem. Distance matcing is a topic of growing importance that is appearing 
on this list with increasing frequency, despite the earlier assurance that 
this is not a problem in search of a solution.
 
The solution implemented in -distmatch- is simple yet has never been implemented 
before. It is a non-intensive solution to an intensive problem. The number of 
matches that must be considered is N x N (it's actually N choose 2 repleated N 
or N-1 times, depending on how you count). This is a large number.
 
The problem with proposing a simple solution is that some people like to entertain 
themselves thinking they could have done it on their own.
 
But it has not been done before, and the problem keeps coming back to this list, 
as I said before.
 
How difficult is distance matching? My first stab was remarkably similar to another 
program called -nearest- from ssc, which according to Nick Cox was not meant to 
earn a good grade in any computer science course. I am guessing this is the most 
obvious solution because this is also the one that Austin was suggesting.

My first program literally took several months to run with observations of about 
30,000. I tried paralleling the codes (multiple computers), -merge-, grid-searching, 
etc, before settling on the current form, which is at least 100 times faster than the 
first one.
 
This rewriting of the program occurred over the course of two years. If 
someone can do this in one sitting, go ahead. Good for them.
 
But anyone thinking that a casual user can be shown how to do this over the 
Statalist is wasting everyone's time, which was clearly the case.
 
The current non-Stata solution, widely used by economists, is to use ArcGIS or 
ArcMap. They cost about $2000-$6000. They usually take about several days if 
not weeks of user-work. If you are using confidential data center (they usually 
charge by the hour), that's another $2000 in expenses. Good luck using the latest 
versions of these programs because they are even more difficult to use. Be grateful 
if you never had to use one of these.
 
Roy

> Laura--
> You don't actually need to download anything to solve this kind of
> problem, or much harder similar problems, as illustrated by e.g.
> http://www.stata.com/statalist/archive/2009-07/msg00261.html
> http://www.stata.com/statalist/archive/2007-01/msg00098.html
> and similar posts.
>
> I particularly doubt the final claim in the help file for -distmatch-
> in the paragraph "Distance matching is computationally intensive.
> Observations of 3,000 may take several minutes to complete. Other
> methods typically take days if not weeks and requires extensive
> user-involvement."
> 
> use farms, clear
> local nf=_N
> g double mindist=.
> merge using waterbodies
> local R=6367.44
> qui forv i=1/`nf' {
> local x1=farm_Y[`i']
> local y1=farm_X[`i']
> local x2 wat_Y
> local y2 wat_X
> g double L=(`y2'-`y1')*_pi/180
> replace L=(`y2'-`y1'-360)*_pi/180 if L_pi
> replace L=(`y2'-`y1'+360)*_pi/180 if L<-_pi
> local t1 acos(sin(`x2'*_pi/180)*sin(`x1'*_pi/180)
> g double d=`t1'+cos(`x2'*_pi/180)*cos(`x1'*_pi/180)*cos(L))*`R'
> su d, meanonly
> replace mindist=r(min) in `i'
> drop L d
> }
> drop _m waterbody_ID wat_X wat_Y
> la var mindist "Distance to center of nearest body of water"
> or adapt as appropriate...
 
_________________________________________________________________
Hotmail® is up to 70% faster. Now good news travels really fast. 
http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index