# st: RE: distance calculation and reshape

 From Martin H�llsten <[email protected]> To <[email protected]> Subject st: RE: distance calculation and reshape Date Fri, 19 Oct 2007 13:15:33 +0200

```Hi,

I have a problem with the reshape command which seems extremely slow.

My data file contains x and y coordinates for n different points, and I want
to calculate the distance between each point (using pythagora�s theorem).

For various reasons, I want to calculate n*n distances rather than
(n*(n-1))/2 unique distances.

The following loop simulates exactly the kind of data I have and then
calculates the problem (n points is here set to 10).

// SIMULATION

local n = 10

range point 1 `n' `n'

gen x = int(abs(uniform()*10000000))

gen y = int(abs(uniform()*10000000))

// CALCULATION

local rows = _N

forvalues n1 = 1/`rows' {

gen point_`n1' = point[`n1']

gen x_`n1' = x[`n1']

gen y_`n1' = y[`n1']

}

reshape long point_ x_ y_ , i(point) j(row)

gen xdiff = abs(x-x_)

gen ydiff = abs(y-y_)

gen distance  = sqrt((xdiff^2)+(ydiff^2))

The problem is that I need to do this for 9230 points. Everything goes fine
(and fast) until the reshape command.

Then Stata seem to get stuck.  The thing is that I run this on a
multiprocessor Windows Vista machine using StataMP 9.2. I have set the
memory to

25 GB which is more than sufficient. I have also set the number of maximum
variables to 32000. So it shouldn�t really be the machinery that fails.

I had the above loop with n = 9230 running for three days (72 hours!), but
the reshape command couldn�t complete within that time.

Does anyone have suggestions how the calculation could be run faster/without
using the reshape command? And why is reshape so tediously slow?

Thanks

Martin H�llsten

[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```