[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Martin Hällsten <martin.hallsten@sofi.su.se> |

To |
"'Austin Nichols'" <austinnichols@gmail.com>, <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: RE: distance calculation and reshape |

Date |
Fri, 19 Oct 2007 21:52:20 +0200 |

Austin, First, the machine I use has 34 Gb of physical RAM so it shouldn't depend on the hard drive. Second, this seems to be exactly what I need. Some preliminary runs on my own desktop (not on the more powerful machine used previously) revealed that the computation time of the "post" procedure is a perfectly linear function of the number of distances, whereas the time for my lousy "reshape" procedure is a quadratic function. So the estimated time for the "post" with 9 230 points is slightly below 4 hrs. n points 100 200 350 500 750 1000 n distances 10 000 40 000 122 500 250 000 562 500 1 000 000 seconds: post 1 6 20 39 90 161 reshape 2 12 41 96 324 850 I can't wait to try this next week. Thank you very much! Martin Hällsten BTW, isn't a "linear" version of reshape warranted? -----Original Message----- From: Austin Nichols [mailto:austinnichols@gmail.com] Sent: den 19 oktober 2007 16:25 To: statalist@hsphsun2.harvard.edu Subject: Re: st: RE: distance calculation and reshape Martin Hällsten <martin.hallsten@sofi.su.se>: Plain text email only, please! You want to wind up with 87m obs, which may tax your computer no matter how you do it. That said, you are calculating _N^2 (100 in the example, 87m in the data) x and y differences, then computing the distance off those. However, there are only _N*(_N-1)/2 distinct values (45 or 43m) to calculate for the x and y differences, as the relevant matrix of calculations is symmetric. I.e. the distance from i to j is the same as the distance from j to i. It would be faster to do fewer calculations, then expand the data, since copying values is faster than computing them. But let's do every calculation as you request, and just use -post- to write the results of calculations to disk, instead of making variables and using -reshape-. Depending on your memory and hard disk configuration, you may get a big speed improvement this way. I am guessing you don't have 25GB of physical memory, which means Stata is using your hard drive as memory, which makes everything much slower. Try setting memory to no more than one half of your physical RAM. clear set mem 60m set seed 12347 local n = 500 range point 1 `n' `n' gen long x = int(abs(uniform()*10000000)) gen long y = int(abs(uniform()*10000000)) local rows = _N loc tm=real(substr("$S_TIME",4,2)) loc t=60*`tm'+real(substr("$S_TIME",7,2)) set rmsg off tempfile dta postfile t p r px py rx ry dist using `dta', replace forvalues n = 1/`rows' { forvalues i = 1/`rows' { loc p=point[`n'] loc r=point[`i'] loc px=y[`n'] loc py=x[`n'] loc rx=y[`i'] loc ry=x[`i'] loc d=sqrt(((`px'-`rx')^2)+((`py'-`ry')^2)) post t (`p') (`r') (`px') (`py') (`rx') (`ry') (`d') } } postclose t loc tm=real(substr("$S_TIME",4,2)) loc t=60*`tm'+real(substr("$S_TIME",7,2))-`t' loc tm=real(substr("$S_TIME",4,2)) loc s=60*`tm'+real(substr("$S_TIME",7,2)) forvalues n1 = 1/`rows' { gen int point_`n1' = point[`n1'] gen long x_`n1' = x[`n1'] gen long y_`n1' = y[`n1'] } reshape long point_ x_ y_ , i(point) j(r) gen xdiff = abs(x-x_) gen ydiff = abs(y-y_) gen distance = sqrt((xdiff^2)+(ydiff^2)) loc tm=real(substr("$S_TIME",4,2)) loc s=60*`tm'+real(substr("$S_TIME",7,2))-`s' ren point p sort p r joinby p r using `dta' compare d* di as res "Timings:" di "Post: " `t' _n "Reshape: " `s' * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: RE: distance calculation and reshape***From:*"Austin Nichols" <austinnichols@gmail.com>

- Prev by Date:
**st: reg of y on moving average of x at t** - Next by Date:
**RE: st: SAS SAS7BDAT file into Stata9** - Previous by thread:
**Re: st: RE: distance calculation and reshape** - Next by thread:
**st: specifying linear mixed-effects covariance structure** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |