Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Calculate Distance between Properties within Portfolios


From   "S. McKay Price" <[email protected]>
To   [email protected]
Subject   Re: st: RE: Calculate Distance between Properties within Portfolios
Date   Mon, 16 Sep 2013 11:14:35 -0400

Thank you Joe and Robert for your insight and the elegant coding example. Both approaches calculated the distances between properties.

As expected, the brute force approach (using -joinby- ) created millions of observations and took about five minutes to run (my machine has 16GB of RAM).

The -geonear- approach using the loop generated the distances too, although I was unable to get it to keep the unique portfolio identifiers (for later merging and analysis) when specifying the -long- option. That is to say, when I slightly altered Robert's code to include "long within(24000)" as follows:

* --------------- begin example ----------------------------

* using -geonear- from SSC
use "`main'", clear
sum pfolio, meanonly
local npid = r(max)
tempfile nbors
qui forvalues i = 1/`npid' {
  use if pfolio == `i' using "`main'", clear
  save "`nbors'", replace
  geonear propid lat lon using "`nbors'", ///
     n(propid lat lon) ignore long within(24000)
  tempfile res`i'
  save "`res`i''"
}
clear
forvalues i = 1/`npid' {
  append using "`res`i''"
}

* --------------- end example -----------------------------

Thanks again! Your suggestions were most helpful and accomplished what I needed.
McKay

On 9/11/2013 1:29 AM, Robert Picard wrote:
As Joe said, -joinby- is the tool to go if you are going to
do this using a brute force approach. You can also get the
big guns and use -geonear- (from SSC). You will have to do
each portfolio separately but it's still going to be faster
than the brute force approach.

* --------------- begin example ---------------------------

set seed 1234
clear
set obs 20
gen porfolio_id = 1000 + _n
egen pfolio = group(porfolio_id)
expand runiform() * 360 + 2
sort pfolio
by pfolio: gen propid = _n
sort pfolio propid
gen double lat = runiform()
gen double lon = runiform()
tempfile main
save "`main'"

* brute force approach
rename (propid lat lon) (propid0 lat0 lon0)
joinby pfolio using "`main'"
drop if propid == propid0
isid pfolio propid propid0, sort
geodist lat lon lat0 lon0, gen(km_brute) sphere
sort pfolio propid km_brute propid0
by pfolio propid: keep if _n == 1
tempfile brute
save "`brute'"

* using -geonear- from SSC
use "`main'", clear
sum pfolio, meanonly
local npid = r(max)
tempfile nbors
qui forvalues i = 1/`npid' {
   use if pfolio == `i' using "`main'", clear
   save "`nbors'", replace
   geonear propid lat lon using "`nbors'", ///
      n(propid lat lon) ignore
   tempfile res`i'
   save "`res`i''"
}
clear
forvalues i = 1/`npid' {
   append using "`res`i''"
}
merge 1:1 pfolio propid using "`brute'", nogen
assert nid == propid0
assert abs(km_brute - km_to_nid) < 1e-12

* --------------- end example -----------------------------


On Tue, Sep 10, 2013 at 9:49 PM, Joe Canner <[email protected]> wrote:
McKay,

Take a look at -joinby-. You will probably have to create a duplicate copy of your dataset and rename the property_id, lat, and lon variables in the duplicated data set. Then do:

. use original.dta
. joinby portfolio_id using duplicate.dta
. geodist lat lon duplat duplon

(Warning: this will create about 6.5 million records.)

Regards,
Joe
________________________________________
From: [email protected] [[email protected]] on behalf of S. McKay Price [[email protected]]
Sent: Tuesday, September 10, 2013 6:28 PM
To: [email protected]
Subject: st: Calculate Distance between Properties within Portfolios

Hello,

I'm trying to calculate the distance, in miles or kilometers, between
all possible pairwise combinations of properties within a given
portfolio.  Is there an efficient way to structure the data to
accomplish this?

My data include numerous portfolios (roughly 200), each with a unique
portfolio identifier (portfolio_id).  And, there are multiple properties
within each portfolio (180 on average), where each property has a unique
property identifier (property_id).   I have latitude and longitude
coordinates in decimal form for each property (e.g. 42.270873
-83.726329) for use in a command such as -geodist- from SSC, or
something similar.  The data are organized as follows:

portfolio_id property_id latitude longitude
1 1 lat lon
1 2 lat lon
1 3 lat lon
...
2 1 lat lon
2 2 lat lon
2 3 lat lon
etc...

Any suggestions?  Thank you for your consideration.

McKay

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index