Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: looping with geodist


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: looping with geodist
Date   Fri, 7 May 2010 11:23:45 -0400

Frederick Guy <f.guy@bbk.ac.uk> :
There are numerous examples addressing your need in the Archives, e.g.:

http://www.stata.com/statalist/archive/2009-09/msg00473.html
http://www.stata.com/statalist/archive/2009-07/msg00261.html
http://www.stata.com/statalist/archive/2007-01/msg00098.html

Note also the calculation of distance (using an approximation that
assumes the Earth is a sphere; see -vincenty- on SSC for an
alternative) between two points on Earth measured in decimal degrees
lat/lon occupies a large fraction of msg00473's code, but need not;
all the calculations could be telescoped into one line (it's just
easier to break it up), and the local macros are mostly unnecessary.
Plus, the formula in that message is the weakest of many alternatives
for great-circle distance; see e.g.
http://en.wikipedia.org/wiki/Great-circle_distance (but downloading a
package for any of those spherical approximate computations seems like
overkill).

As far as I know, the unmatched merge approach was first promulgated
in January 2007 (see e.g.
http://www.stata.com/statalist/archive/2007-01/msg00082.html but the
name came later; the approach was developed in 2003 for a paper
published 2009 in the JHE--see also Appendix A of
http://www.nber.org/papers/w13246 if you are interested in inverse
distance weights) as a way to have two datasets in memory at once;
another way is to repeatedly merge or append a second dataset onto a
single observation from the first, but this is understandably less
efficient.  The crucial detail to remember with an unmatched merge
strategy (merging on _n rather than any variables) is that all the
variable names must be distinct across the two datasets.

Suppose your location variables are xi,yi,xj,yj and you have Ni obs of
type i and Nj obs of type j.  If you want distances to each location
of type j stored on the type i obs, you will need Nj new variables to
store distances; if you only want summary stats across locations of
type j, you should not create that many new variables at once, to
conserve memory.  Suppose you want the weighted sum of inverse
distances (assuming none are zero); then you could just:

use type_i, clear
local Ni=_N
merge using type_j
g w=.
qui forv i=1/`Ni' {
g double L=(yj-yi[`i'])*_pi/180
replace L=(yj-yi[`i']-360)*_pi/180 if L<. & L>_pi
replace L=(yj-yi[`i']+360)*_pi/180 if L<-_pi
local t1 acos(sin(xj*_pi/180)*sin(xi[`i']*_pi/180)
g i=1/(`t1'+cos(xj*_pi/180)*cos(xi[`i']*_pi/180)*cos(L))*6367.44)
su i, meanonly
replace w=r(min) in `i'
drop L i
}
la var w "Sum of Inverse (Approx) Distances"


On Fri, May 7, 2010 at 4:35 AM, Frederick Guy <f.guy@bbk.ac.uk> wrote:
> Robert Picard sent the code below, which works as advertised - many thanks, Robert! Now I have a slightly different problem: I have two kinds of locations in the data, i and j. For each location of type i, I need to compute the distances to every location of type j. If I just stack observations type i on top of observations type j, geodist doesn't like the missing values (observations type i have missing values for type j, and vice versa). Can anybody suggest a solution?
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Robert Picard
> Sent: 30 April 2010 17:09
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: RE: RE: RE: AW: Creating index relative to other observations
>
> Perhaps the following example is close to what you are trying to do.
> It loops through all observations. Each time, it calculates the
> distance from observation `i' to all others (distance will be missing
> for the observation `i'). Values for variable x1 are adjusted
> according to the distance to `i' and summed. The observation `i' of x3
> is then updated with the value of the sum plus the value of x2 for
> observation `i'.
>
> Hope this helps,
>
> Robert
> http://robertpicard.com/
>
> *--------------------------- begin example -----------------------
> version 11
>
> * This example require my -geodist- program available on SSC
> * To install: ssc install geodist
>
> clear all
> set obs 5
> set seed 1234
> gen lat = 37 + (41 - 37) * uniform()
> gen lon = -109 + (109 - 102) * uniform()
> gen x1 = round(uniform()*100)
> gen x2 = round(uniform()*100)
> gen x3 = .
>
> forvalues i = 1/`c(N)' {
>        geodist lat lon `=lat[`i']' `=lon[`i']' if _n != `i', gen(d)
>        gen xtemp = x1 / d
>        sum xtemp, meanonly
>        qui replace x3 = r(sum) + x2 in `i'
>        list
>        drop d xtemp
> }
> *--------------------- end example --------------------------
>
>
> On Fri, Apr 30, 2010 at 7:49 AM, Frederick Guy <f.guy@bbk.ac.uk> wrote:
>> Many thanks. Now for a crash-course in MATA...
>>
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu
>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
>> Sent: 29 April 2010 19:22
>> To: statalist@hsphsun2.harvard.edu
>> Subject: st: RE: RE: AW: Creating index relative to other observations
>>
>> I'd do this in Mata. Mata has a -for- loop.
>>
>> Nick
>> n.j.cox@durham.ac.uk
>>
>> Frederick Guy
>>
>> Thanks, I guess I was unclear on this aspect of the problem. For each
>> observation, the sum I'm talking about is of measurements made relative
>> to all other observations (or more generally, to some set of other
>> observations) in the sample.
>>
>> Martin Weiss
>>
>> ".. sum up the results of these computations,".
>>
>> Creating sums can mean different things in Stata. It may sound trite,
>> but
>> the easiest is simply to -generate- a sum by adding values with a "+"
>> sign.
>> If you want the total of a variable, look at -egen, total()-. If you
>> want a
>> running sum, take a look at -help sum()-.
>>
>> Frederick Guy
>>
>> I have need to use information from all observations (about 1800 of
>> them) to create a new variable.
>>
>> The variable created is a weighted sum of the inverse of geographical
>> distances between observation i and all j n.e. i. I have longitude and
>> latitude for each observation, and computation of the distance from any
>> i to any j is straightforward. What I don't know is how to get Stata to
>> loop over all observation and sum up the results.
>>
>> For every observation i, I think I need to
>>
>> (a) loop through all j n.e. I, doing computations involving variables
>> x1, x2(i) and x1, x2(j), and then
>>
>> (b) sum up the results of these computations, returning a value which
>> becomes variable x3 for that i.
>>
>> I expect there's a straightforward way to do this. Any suggestions?
>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index