[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Austin Nichols" <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: distance calculation and reshape |

Date |
Fri, 19 Oct 2007 10:24:33 -0400 |

Martin Hällsten <martin.hallsten@sofi.su.se>: Plain text email only, please! You want to wind up with 87m obs, which may tax your computer no matter how you do it. That said, you are calculating _N^2 (100 in the example, 87m in the data) x and y differences, then computing the distance off those. However, there are only _N*(_N-1)/2 distinct values (45 or 43m) to calculate for the x and y differences, as the relevant matrix of calculations is symmetric. I.e. the distance from i to j is the same as the distance from j to i. It would be faster to do fewer calculations, then expand the data, since copying values is faster than computing them. But let's do every calculation as you request, and just use -post- to write the results of calculations to disk, instead of making variables and using -reshape-. Depending on your memory and hard disk configuration, you may get a big speed improvement this way. I am guessing you don't have 25GB of physical memory, which means Stata is using your hard drive as memory, which makes everything much slower. Try setting memory to no more than one half of your physical RAM. clear set mem 60m set seed 12347 local n = 500 range point 1 `n' `n' gen long x = int(abs(uniform()*10000000)) gen long y = int(abs(uniform()*10000000)) local rows = _N loc tm=real(substr("$S_TIME",4,2)) loc t=60*`tm'+real(substr("$S_TIME",7,2)) set rmsg off tempfile dta postfile t p r px py rx ry dist using `dta', replace forvalues n = 1/`rows' { forvalues i = 1/`rows' { loc p=point[`n'] loc r=point[`i'] loc px=y[`n'] loc py=x[`n'] loc rx=y[`i'] loc ry=x[`i'] loc d=sqrt(((`px'-`rx')^2)+((`py'-`ry')^2)) post t (`p') (`r') (`px') (`py') (`rx') (`ry') (`d') } } postclose t loc tm=real(substr("$S_TIME",4,2)) loc t=60*`tm'+real(substr("$S_TIME",7,2))-`t' loc tm=real(substr("$S_TIME",4,2)) loc s=60*`tm'+real(substr("$S_TIME",7,2)) forvalues n1 = 1/`rows' { gen int point_`n1' = point[`n1'] gen long x_`n1' = x[`n1'] gen long y_`n1' = y[`n1'] } reshape long point_ x_ y_ , i(point) j(r) gen xdiff = abs(x-x_) gen ydiff = abs(y-y_) gen distance = sqrt((xdiff^2)+(ydiff^2)) loc tm=real(substr("$S_TIME",4,2)) loc s=60*`tm'+real(substr("$S_TIME",7,2))-`s' ren point p sort p r joinby p r using `dta' compare d* di as res "Timings:" di "Post: " `t' _n "Reshape: " `s' * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: RE: distance calculation and reshape***From:*Martin Hällsten <martin.hallsten@sofi.su.se>

**References**:**st: RE: distance calculation and reshape***From:*Martin Hällsten <martin.hallsten@sofi.su.se>

- Prev by Date:
**Re: st: RE: -mfx compute, force- not reporting se in -estout-** - Next by Date:
**Re: st: Integrating Graphs and Output into single file** - Previous by thread:
**st: RE: distance calculation and reshape** - Next by thread:
**RE: st: RE: distance calculation and reshape** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |