[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: exact command for distance ? |

Date |
Fri, 11 Sep 2009 14:00:37 -0400 |

Roy-- I also have no problem downloading others' work, and my hard drive is cluttered with the output of Jann, Baum, Schaffer, Jenkins, Cox, and many others. I seem to use one of Ben Jann's programs every day. And one of my posted solutions on this topic requires downloading -vincenty- (from SSC), which gives much better distance estimates, though at a substantial time cost. I am not even claiming that -distmatch- has no utility--no doubt many will find it useful. But I'm afraid I don't see your point in this post at all--you claim in the help file that -distmatch- "take several minutes to complete" for 3000 obs, and other methods take "days if not weeks" yet the method that I have outlined in several posts is entirely general (i.e. it can be customized to produce any range of statistics for any range of neighbors, which no program can claim to do) and runs faster than -distmatch- in many cases, e.g. by a factor of four or five here: clear tempfile p h range pid 1 3000 3000 set seed 123 g p_lat=32+uniform()*10 g p_lon=-120+uniform()*40 compress save `p' clear input h_lat h_lon 42.103 -80.065 42.103 -80.065001 39.739 -75.605 37.499 -77.470 39.464 -77.962 27.844 -82.796 39.138 -84.509 38.271 -85.694 36.143 -86.805 35.832 -86.073 36.313 -87.676 33.505 -86.801 32.288 -90.258 41.628 -93.659 44.412 -103.469 40.807 -96.625 35.608 -91.265 31.292 -92.461 31.080 -97.348 31.783 -106.474 46.612 -112.048 43.618 -116.194 41.242 -110.991 34.568 -112.456 40.698 -111.990 40.044 -111.716 46.053 -118.356 33.860 -118.149 34.093 -118.344 41.628 . end g id=_n expand 100 bys id: replace id=id*100+_n isid id replace h_lon=h_lon+uniform() replace h_lat=h_lat+uniform() compress save `h' su timer on 1 use `p' local np=_N g double dnear=. g long idnear=. merge using `h' local R=6367.44 forv i=1/`np' { local x1=p_lat[`i'] local y1=p_lon[`i'] local x2 h_lat local y2 h_lon qui { g double L=(`y2'-`y1')*_pi/180 replace L=(`y2'-`y1'-360)*_pi/180 if L<. & L>_pi replace L=(`y2'-`y1'+360)*_pi/180 if L<-_pi local t1 acos(sin(`x2'*_pi/180)*sin(`x1'*_pi/180) g double d=`t1'+cos(`x2'*_pi/180)*cos(`x1'*_pi/180)*cos(L))*`R' su d, meanonly replace dnear=r(min) in `i' su id if d==r(min), meanonly replace idnear=r(min) in `i' } drop L d } drop if _m==2 drop id h_lat h_lon _m la var dnear "Distance to nearest hospital" la var idnear "ID of nearest hospital" timer off 1 l in 1/20, noo clean tempfile m save `m' timer on 2 use `p' ren p_lat h_lat ren p_lon h_lon append using `h' distmatch, id(id) lat(h_lat) lon(h_lon) near(1) km keep if pid<. drop id h_lat h_lon timer off 2 joinby pid using `m', unm(both) ta _m drop _m order pid p_lat p_lon l in 1/20, noo clean timer list *Plus, if you are working in a confidential data center, there is no guarantee you can download a user-written program from SSC, so it seems worthwhile that you can get a solution with a few lines of regular code (one unmatched merge and a loop over observations). On Fri, Sep 11, 2009 at 12:29 PM, Roy Wada<roywada@hotmail.com> wrote: > I have no problem downloading and using programs written by other people. I > have not used one of Austin's programs, but it is nice to know that it's there > if I need it. Could I write a program on regression discontinuity from scatch? > Sure. Give me two hours. But why would I? I am very grateful to those who have > shared their program with me, and I hope they find my programs useful too. > > When a topic repeatedly come up on this list, it indicates an unaddressed > problem. Distance matcing is a topic of growing importance that is appearing > on this list with increasing frequency, despite the earlier assurance that > this is not a problem in search of a solution. > > The solution implemented in -distmatch- is simple yet has never been implemented > before. It is a non-intensive solution to an intensive problem. The number of > matches that must be considered is N x N (it's actually N choose 2 repleated N > or N-1 times, depending on how you count). This is a large number. > > The problem with proposing a simple solution is that some people like to entertain > themselves thinking they could have done it on their own. > > But it has not been done before, and the problem keeps coming back to this list, > as I said before. > > How difficult is distance matching? My first stab was remarkably similar to another > program called -nearest- from ssc, which according to Nick Cox was not meant to > earn a good grade in any computer science course. I am guessing this is the most > obvious solution because this is also the one that Austin was suggesting. > > My first program literally took several months to run with observations of about > 30,000. I tried paralleling the codes (multiple computers), -merge-, grid-searching, > etc, before settling on the current form, which is at least 100 times faster than the > first one. > > This rewriting of the program occurred over the course of two years. If > someone can do this in one sitting, go ahead. Good for them. > > But anyone thinking that a casual user can be shown how to do this over the > Statalist is wasting everyone's time, which was clearly the case. > > The current non-Stata solution, widely used by economists, is to use ArcGIS or > ArcMap. They cost about $2000-$6000. They usually take about several days if > not weeks of user-work. If you are using confidential data center (they usually > charge by the hour), that's another $2000 in expenses. Good luck using the latest > versions of these programs because they are even more difficult to use. Be grateful > if you never had to use one of these. > > Roy > >> Laura-- >> You don't actually need to download anything to solve this kind of >> problem, or much harder similar problems, as illustrated by e.g. >> http://www.stata.com/statalist/archive/2009-07/msg00261.html >> http://www.stata.com/statalist/archive/2007-01/msg00098.html >> and similar posts. >> >> I particularly doubt the final claim in the help file for -distmatch- >> in the paragraph "Distance matching is computationally intensive. >> Observations of 3,000 may take several minutes to complete. Other >> methods typically take days if not weeks and requires extensive >> user-involvement." >> >> use farms, clear >> local nf=_N >> g double mindist=. >> merge using waterbodies >> local R=6367.44 >> qui forv i=1/`nf' { >> local x1=farm_Y[`i'] >> local y1=farm_X[`i'] >> local x2 wat_Y >> local y2 wat_X >> g double L=(`y2'-`y1')*_pi/180 >> replace L=(`y2'-`y1'-360)*_pi/180 if L_pi >> replace L=(`y2'-`y1'+360)*_pi/180 if L<-_pi >> local t1 acos(sin(`x2'*_pi/180)*sin(`x1'*_pi/180) >> g double d=`t1'+cos(`x2'*_pi/180)*cos(`x1'*_pi/180)*cos(L))*`R' >> su d, meanonly >> replace mindist=r(min) in `i' >> drop L d >> } >> drop _m waterbody_ID wat_X wat_Y >> la var mindist "Distance to center of nearest body of water" >> or adapt as appropriate... > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: RE: exact command for distance ?***From:*Roy Wada <roywada@hotmail.com>

**References**:**st: RE: exact command for distance ?***From:*Laura Platchkov <LMP881@bham.ac.uk>

**Re: st: RE: exact command for distance ?***From:*Austin Nichols <austinnichols@gmail.com>

**RE: st: RE: exact command for distance ?***From:*Roy Wada <roywada@hotmail.com>

- Prev by Date:
**st: Table command limited to 5 Statistics?** - Next by Date:
**st: Testing for a particular value and replacing in last observation?** - Previous by thread:
**RE: st: RE: exact command for distance ?** - Next by thread:
**RE: st: RE: exact command for distance ?** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |