st: RE: distance calculation and reshape

From   Martin Hällsten
To   <>
Subject   st: RE: distance calculation and reshape
Date   Fri, 19 Oct 2007 13:15:33 +0200



I have a problem with the reshape command which seems extremely slow. 


My data file contains x and y coordinates for n different points, and I want
to calculate the distance between each point (using pythagora’s theorem). 

For various reasons, I want to calculate n*n distances rather than
(n*(n-1))/2 unique distances. 


The following loop simulates exactly the kind of data I have and then
calculates the problem (n points is here set to 10).



local n = 10

range point 1 `n' `n'

gen x = int(abs(uniform()*10000000))

gen y = int(abs(uniform()*10000000))



local rows = _N

forvalues n1 = 1/`rows' {

      gen point_`n1' = point[`n1']

      gen x_`n1' = x[`n1']

      gen y_`n1' = y[`n1']



reshape long point_ x_ y_ , i(point) j(row)

gen xdiff = abs(x-x_) 

gen ydiff = abs(y-y_) 

gen distance  = sqrt((xdiff^2)+(ydiff^2))


The problem is that I need to do this for 9230 points. Everything goes fine
(and fast) until the reshape command. 

Then Stata seem to get stuck.  The thing is that I run this on a
multiprocessor Windows Vista machine using StataMP 9.2. I have set the
memory to 

25 GB which is more than sufficient. I have also set the number of maximum
variables to 32000. So it shouldn’t really be the machinery that fails. 

I had the above loop with n = 9230 running for three days (72 hours!), but
the reshape command couldn’t complete within that time. 


Does anyone have suggestions how the calculation could be run faster/without
using the reshape command? And why is reshape so tediously slow?




Martin Hällsten



