Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: distance calculation and reshape


From   "Austin Nichols" <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: distance calculation and reshape
Date   Fri, 19 Oct 2007 10:24:33 -0400

Martin Hällsten <martin.hallsten@sofi.su.se>:

Plain text email only, please!

You want to wind up with 87m obs, which may tax your
computer no matter how you do it. That said, you are
calculating _N^2 (100 in the example, 87m in the data) x
and y differences, then computing the distance off those.
However, there are only _N*(_N-1)/2 distinct values (45 or
43m) to calculate for the x and y differences, as the
relevant matrix of calculations is symmetric.  I.e. the
distance from i to j is the same as the distance from j to
i. It would be faster to do fewer calculations, then
expand the data, since copying values is faster than
computing them.

But let's do every calculation as you request, and just
use -post- to write the results of calculations to disk,
instead of making variables and using -reshape-.
Depending on your memory and hard
disk configuration, you may get a big speed improvement
this way.  I am guessing you don't have 25GB of physical
memory, which means Stata is using your hard drive as
memory, which makes everything much slower.  Try setting
memory to no more than one half of your physical RAM.

clear
set mem 60m
set seed 12347
local n = 500
range point 1 `n' `n'
gen long x = int(abs(uniform()*10000000))
gen long y = int(abs(uniform()*10000000))
local rows = _N
loc tm=real(substr("$S_TIME",4,2))
loc t=60*`tm'+real(substr("$S_TIME",7,2))
set rmsg off
tempfile dta
postfile t p r px py rx ry dist using `dta', replace
forvalues n = 1/`rows' {
forvalues i = 1/`rows' {
loc p=point[`n']
loc r=point[`i']
loc px=y[`n']
loc py=x[`n']
loc rx=y[`i']
loc ry=x[`i']
loc d=sqrt(((`px'-`rx')^2)+((`py'-`ry')^2))
post t (`p') (`r') (`px') (`py')  (`rx') (`ry') (`d')
}
}
postclose t
loc tm=real(substr("$S_TIME",4,2))
loc t=60*`tm'+real(substr("$S_TIME",7,2))-`t'

loc tm=real(substr("$S_TIME",4,2))
loc s=60*`tm'+real(substr("$S_TIME",7,2))
forvalues n1 = 1/`rows' {
     gen int point_`n1' = point[`n1']
     gen long x_`n1' = x[`n1']
     gen long y_`n1' = y[`n1']
}
reshape long point_ x_ y_ , i(point) j(r)
gen xdiff = abs(x-x_)
gen ydiff = abs(y-y_)
gen distance  = sqrt((xdiff^2)+(ydiff^2))
loc tm=real(substr("$S_TIME",4,2))
loc s=60*`tm'+real(substr("$S_TIME",7,2))-`s'

ren point p
sort p r
joinby p r using `dta'
compare d*
di as res "Timings:"
di "Post: " `t' _n "Reshape: " `s'

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index