# Re: st: RE: Data management

 From "Roy,Suryadipta" To statalist@hsphsun2.harvard.edu Subject Re: st: RE: Data management Date Wed, 14 Feb 2007 10:19:19 -0600

Dear Scott and Michael,

Thank you so much for the replies! Actually my previous question was part of a bigger question and I really need some help here. I have two datasets, the first one is a panel (unbalanced) of 150 countries between 1980-00 and looks like the following:

country year x1 x2 x3(=x1+x2)
A1 1980 2 1 3
.. .. .. ..
2000 8 5 13
A2 1980 10 5 15
.. .. .. ..
2000 18 15 33
. .. .. .. ..
. .. .. .. ..
A150 1980 3 2 5
.. .. .. ..
2000 6 4 10

I have a second dataset of say, 200 countries indicating distances between the countries and it looks like the following:

origin destination distance(kms)
A1 A2 100
A1 A3 150
. . .
A1 A200 80
A2 A1 100
A2 A3 150
. . .
A2 A200 250
. . .
. . .
A200 A1 80
A200 A2 250
. . .
. . .
A200 A199 50

What I want to do is the following:
generate a variable ("remote") for each country where ("remote" for country A1)=[x3(for A2)/x3(sum for all countries)-x3(for A2)]*distance(A1,A2=100)+ [x3(for A3)/x3(sum for all countries-x3(for A3)]*distance(A1,A3=150)+...+[x3(for A150)/x3(sum for all countries)-x3(for A150)]*distance(A1,A150), and thereby generate the value of "remote" for all countries in the first dataset.

At this point, it seems really difficult to me and would greatly appreciate any help.

Thanks again,