Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: AW: More efficient way of programming

From   "Stephan Brunow" <>
To   <>
Subject   st: AW: More efficient way of programming
Date   Tue, 6 Jun 2006 12:16:59 +0200


there might be another way.
I do not know if it is a more efficient and less time consuming way, but it
might work:

reshape the data set to

id_i	id_j	dist
1	1	0
1	2	23
1	3	21
1	2500	530
and so on.

Get the shortest distance 
.by id_i, sort: egen mindist=min(dist) if dist>0

Now look for the station:
.gen helpvar=mindist-dist

which is zero for the closest station. Now you can make a small test first
and get the id (with a small way around):

.tab mindist (...this is the test)
.gen helpnear_id=id_j if helpvar==0
.replace helpnear_id=0 if helpnear_id==.
.by id_i, sort: egen near_id=max(helpnear_id)
.drop helpnear_id helpvar

Finally you might reshape again to get the result in a matrix.

However, I do not know if it is faster than 1.4 hours since reshape is a bit
more time consuming :-)


Stephan Brunow
MSc. in Economics und Diplom-Verkehrswirtschaftler 
Professur für VWL, insb. Makroökonomik und
Institut für Wirtschaft und Verkehr
Fakultät für Verkehrswissenschaften „Friedrich List"
Technische Universität Dresden
D-01062 Dresden 

Phone: ++49-(0)351-463-36806
Fax: ++49-(0)351-463-36819

-----Ursprüngliche Nachricht-----
[] Im Auftrag von Jitian Sheu
Gesendet: Dienstag, 6. Juni 2006 11:41
Betreff: st: More efficient way of programming

Dear listers:

I have a data set with the following structure:

id  d1   d2    d3.....      d2500   min_dis
1   0    23   21          530      21             
2   23   0
(up to 2500)

i.e. number of observation=2500, and each one represent to one station(id)
   dX= the distance to stationX, X=1...2500
   (since there are 2500 observation,==> I have 2500 distance variables)

   min_dis=minimum distance of the nearest station.

So, for each observation(station), I know its minimum distance to another
Now, I want to know its nearest station id.
i.e. I want to have another variable (say called near_id). By this new
variable, I can then obtain the id number of each observation's nearest
station id.

For example (using the above data)
id  d1   d2    d3.....      d2500   min_dis  ==> near_id
1   0    23   29          530      21     ==>     2
2   23   0    32          41       23     ==>     1
3   29   32   0            52       21    ==>     2

For this purpose, I use the following programming code.
Basically, I am doing this observation by observation:

gen near_id=.

forvalues	i=1(1)2500{

           forvalues	j=1(1)2500{
				replace near_id =`j'	if id==`i'&

Therefore, there are totally 2500X2500 loops
If each loop takes 2 seconds==> totally, I need 5000 seconds to finish the
whole process, which is 1.4 hours.

Is there any efficient way to do that?

Many thanks.


*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index