[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: exact command for distance ?

From   Roy Wada <>
To   <>
Subject   RE: st: RE: exact command for distance ?
Date   Sat, 12 Sep 2009 17:22:37 -0700

We can keep the discussion here if others are interested. And you are 
Yes, that is what I mean by grid-search. There is no point in comparing 
something that is not in the neighborhood.
The quickiest way I have found is the cut the map into overlapping grids. 
This is an old code that was in process of revision (about a year old), 
but it will do 2 million observations in about 5 minutes assuming random 
dispersion of points.
In Laura's case, she just need to reverse the comparison group.
set mem 1g
set obs 2000000
gen id=_n
set seed 123
gen longit=uniform()*10000
gen latit=uniform()*10000
cap program drop distance
program define distance
syntax [using], kernel(real) RADius(real)
gen X=round(longit,`kernel')
gen Y=round(latit,`kernel')
tempfile file0 file1
gen markX=.
save `file0'
replace markX=0 if markX==.
append using `file0'
replace markX=1 if markX==.
append using `file0'
replace markX=2 if markX==.
gen markY=.
save `file1'
replace markY=3 if markY==.
append using `file1'
replace markY=4 if markY==.
append using `file1'
replace markY=5 if markY==.
*sort id markX markY
replace X=X-`kernel' if markX==1
replace X=X+`kernel' if markX==2
replace Y=Y-`kernel' if markY==4
replace Y=Y+`kernel' if markY==5
gen mark=0 if markX==0 & markY==3
egen XY=group(X Y)
bys XY: gen count=_N
keep if count>1
* drop if the middle part (mark==0) is not there
bys XY: egen min=min(mark)
drop if min~=0
drop min

* renumber
drop XY
egen XY=group(X Y)
*** re-sorting is necessary
sort XY
drop count
bys XY: gen count=_N

*gen row=_n
*bys XY: gen begin=row if _n==1
*bys XY: replace begin=begin[_n-1] if begin[_n-1]~=.
*bys XY: gen end0=row if _n==_N
*bys XY: egen end=max(end0)
*drop end0
forval num=1/10 {
 gen match`num'=.
 gen dist`num'=.
egen max=max(XY)
local max=max
drop max
local place=1
while `place'=0 & `=mark[`first']'==0 {
   if `dist'<=`radius' & `dist'>0 & `=mark[`first']'==0 {
    di " `max' `=id[`first']' `=id[`second']' `dist'"
    local num=`num'+1
    qui replace match`num'=`=id[`second']' in `first'
    qui replace dist`num'=`dist' in `first'
 local place=`place'+`=count[`place']'

distance, kernel(.7) rad(.1)


> To quote GH Hardy: "I am reluctant to intrude in a discussion concerning
> matters of which I have no expert knowledge,..."
> But, it seems to me that for each farm, you could create a quick screen
> that would remove the polygon points that were unlikely to be close to
> the farm.
> If f_lat and f_long are the coordinates of the farm and the coordinates
> for each polygon point are p_lat and p_long, then
> gen fp_lat = f_lat - p_lat
> gen fp_long = f_long - p_long
> gen dist = sqrt(fp_lat^2 + fp_long^2)
> qui sum dist, detail
> keep if dist
> If all your farms and lakes are in the UK, you would need to add a
> constant to the longitudes before calculating the differences (modulo
> 180 or 360; I don't know how longitude is specified in the data).
Hotmail: Powerful Free email with security by Microsoft.
*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index