Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: st: RE: Calculate Distance between Properties within Portfolios


From   Robert Picard <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Fwd: st: RE: Calculate Distance between Properties within Portfolios
Date   Mon, 16 Sep 2013 12:31:25 -0400

Indeed, the -long- option for -geonear- is different and
does not merge back with the original data because multiple
neighbors are allowed, which could lead to a "m:m" merge.

For futur reference, here's an revised version that shows
how to use -geonear- in long form mode for every value of a
variable.

Robert

* --------------- begin example ---------------------------

set seed 1234
clear
set obs 20
gen porfolio_id = 1000 + _n
expand runiform() * 360 + 2
sort porfolio_id
by porfolio_id: gen propid = _n
sort porfolio_id propid
gen double lat = runiform()
gen double lon = runiform()
tempfile main
save "`main'"

* brute force approach
rename (propid lat lon) (propid0 lat0 lon0)
joinby porfolio_id using "`main'"
drop if propid == propid0
isid porfolio_id propid propid0, sort
geodist lat lon lat0 lon0, gen(km_brute) sphere
sort porfolio_id propid km_brute propid0
by porfolio_id propid: keep if _n == 1
tempfile brute
save "`brute'"

* using -geonear- from SSC
use "`main'", clear
clonevar propid0 = propid
save "`main'", replace
levelsof porfolio_id, clean
local pidlist `r(levels)'
local npid 0
tempfile nbors
qui foreach pid of local pidlist {
  use if porfolio_id == `pid' using "`main'", clear
  save "`nbors'", replace
  geonear propid lat lon using "`nbors'", ///
     n(propid0 lat lon) ignore long
  gen porfolio_id = `pid'
  local ++npid
  tempfile res`npid'
  save "`res`npid''"
}
clear
forvalues i=1/`npid' {
  append using "`res`i''"
}

rename propid0 propid0_gn
merge 1:1 porfolio_id propid using "`brute'", nogen
assert propid0_gn == propid0
assert abs(km_brute - km_to_propid0) < 1e-12

* --------------- end example -----------------------------



---------- Forwarded message ----------
From: S. McKay Price <[email protected]>
Date: Mon, Sep 16, 2013 at 11:14 AM
Subject: Re: st: RE: Calculate Distance between Properties within Portfolios
To: [email protected]


Thank you Joe and Robert for your insight and the elegant coding
example.  Both approaches calculated the distances between properties.

As expected, the brute force approach (using -joinby- ) created
millions of observations and took about five minutes to run (my
machine has 16GB of RAM).

The -geonear- approach using the loop generated the distances too,
although I was unable to get it to keep the unique portfolio
identifiers (for later merging and analysis) when specifying the
-long- option.  That is to say, when I slightly altered Robert's code
to include "long within(24000)" as follows:

* --------------- begin example ----------------------------

* using -geonear- from SSC
use "`main'", clear
sum pfolio, meanonly
local npid = r(max)
tempfile nbors
qui forvalues i = 1/`npid' {
  use if pfolio == `i' using "`main'", clear
  save "`nbors'", replace
  geonear propid lat lon using "`nbors'", ///
     n(propid lat lon) ignore long within(24000)

  tempfile res`i'
  save "`res`i''"
}
clear
forvalues i = 1/`npid' {
  append using "`res`i''"
}

* --------------- end example -----------------------------

Thanks again!  Your suggestions were most helpful and accomplished
what I needed.
McKay

On 9/11/2013 1:29 AM, Robert Picard wrote:
>
> As Joe said, -joinby- is the tool to go if you are going to
> do this using a brute force approach. You can also get the
> big guns and use -geonear- (from SSC). You will have to do
> each portfolio separately but it's still going to be faster
> than the brute force approach.
>
> * --------------- begin example ---------------------------
>
> set seed 1234
> clear
> set obs 20
> gen porfolio_id = 1000 + _n
> egen pfolio = group(porfolio_id)
> expand runiform() * 360 + 2
> sort pfolio
> by pfolio: gen propid = _n
> sort pfolio propid
> gen double lat = runiform()
> gen double lon = runiform()
> tempfile main
> save "`main'"
>
> * brute force approach
> rename (propid lat lon) (propid0 lat0 lon0)
> joinby pfolio using "`main'"
> drop if propid == propid0
> isid pfolio propid propid0, sort
> geodist lat lon lat0 lon0, gen(km_brute) sphere
> sort pfolio propid km_brute propid0
> by pfolio propid: keep if _n == 1
> tempfile brute
> save "`brute'"
>
> * using -geonear- from SSC
> use "`main'", clear
> sum pfolio, meanonly
> local npid = r(max)
> tempfile nbors
> qui forvalues i = 1/`npid' {
>    use if pfolio == `i' using "`main'", clear
>    save "`nbors'", replace
>    geonear propid lat lon using "`nbors'", ///
>       n(propid lat lon) ignore
>    tempfile res`i'
>    save "`res`i''"
> }
> clear
> forvalues i = 1/`npid' {
>    append using "`res`i''"
> }
> merge 1:1 pfolio propid using "`brute'", nogen
> assert nid == propid0
> assert abs(km_brute - km_to_nid) < 1e-12
>
> * --------------- end example -----------------------------
>
>
> On Tue, Sep 10, 2013 at 9:49 PM, Joe Canner <[email protected]> wrote:
>>
>> McKay,
>>
>> Take a look at -joinby-. You will probably have to create a duplicate copy of your dataset and rename the property_id, lat, and lon variables in the duplicated data set. Then do:
>>
>> . use original.dta
>> . joinby portfolio_id using duplicate.dta
>> . geodist lat lon duplat duplon
>>
>> (Warning: this will create about 6.5 million records.)
>>
>> Regards,
>> Joe
>> ________________________________________
>> From: [email protected] [[email protected]] on behalf of S. McKay Price [[email protected]]
>> Sent: Tuesday, September 10, 2013 6:28 PM
>> To: [email protected]
>> Subject: st: Calculate Distance between Properties within Portfolios
>>
>> Hello,
>>
>> I'm trying to calculate the distance, in miles or kilometers, between
>> all possible pairwise combinations of properties within a given
>> portfolio.  Is there an efficient way to structure the data to
>> accomplish this?
>>
>> My data include numerous portfolios (roughly 200), each with a unique
>> portfolio identifier (portfolio_id).  And, there are multiple properties
>> within each portfolio (180 on average), where each property has a unique
>> property identifier (property_id).   I have latitude and longitude
>> coordinates in decimal form for each property (e.g. 42.270873
>> -83.726329) for use in a command such as -geodist- from SSC, or
>> something similar.  The data are organized as follows:
>>
>> portfolio_id property_id latitude longitude
>> 1 1 lat lon
>> 1 2 lat lon
>> 1 3 lat lon
>> ...
>> 2 1 lat lon
>> 2 2 lat lon
>> 2 3 lat lon
>> etc...
>>
>> Any suggestions?  Thank you for your consideration.
>>
>> McKay
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index