# Re: st: smaller, smoother maps

 From Maurizio Pisati To statalist@hsphsun2.harvard.edu Subject Re: st: smaller, smoother maps Date Sat, 30 Aug 2008 09:07:25 +0200

Dear Austin,
here's Stata program -simpoly-, my rough attempt to simplify polygons making up -spmap- basemap datasets. There is no help file, but its usage is pretty simple:

. simpoly ARG1 ARG2

where ARG1 denotes the minimum lenght of the polygon sides you want to keep, and ARG2 denotes the minimum perimeter of the polygons you want to keep.
Based on the values of ARG1 and ARG2 specified by the user, -simpoly- tries to be smart in dropping "excess" polygon vertices and/or polygons, but the algorithm implemented therein is quite rough and doesn't always produce proper polygon simplification. Anyway, according to my experince it works fine in many cases.
Here's a brief example based on maps distributed with -spmap-:

. use Italy-OutlineCoordinates.dta, clear
. count
1152

. simpoly 7000 0 // unit of lemgth is meters
Step 1 : 486 records dropped
Step 2 : 177 records dropped
Step 3 : 3 records dropped
Step 4 : 0 records dropped

. count
486

. save "Italy-OutlineCoordinates(Simplified).dta", replace
file Italy-OutlineCoordinates(Simplified).dta saved

. use "Italy-OutlineData.dta", clear
. spmap using "Italy-OutlineCoordinates(Simplified).dta", id(id) ///
poly(data("Italy-OutlineCoordinates.dta") ocol(red)) polyfirst

As you can see, in this example the representation of Italy remains quite good even after dropping more than a half of the polygon vertices.
Best wishes,
Maurizio

*! -simpoly-: Simplifies polygon datasets stored in -spmap- format
*! Version 1.0.0 - 7 December 2006
*! Author: Maurizio Pisati
*! Department of Sociology and Social Research
*! University of Milano Bicocca (Italy)
*! maurizio.pisati@unimib.it

* ----------------------------------------------------------------------------
* 1. Define program
* ----------------------------------------------------------------------------

program simpoly
version 9.2
args LIM1 LIM2

* ----------------------------------------------------------------------------
* 2. Check syntax
* ----------------------------------------------------------------------------

if ("`LIM1'" == "") local LIM1 = 0
if ("`LIM2'" == "") local LIM2 = 0

* ----------------------------------------------------------------------------
* 3. Perform main task
* ----------------------------------------------------------------------------

local GO = 1
local ST = 1
while `GO' {
qui _simplify `LIM1' `LIM2'
di as txt "Step " %2.0f `ST' " : `r(drop)' records dropped"
local ST = `ST' + 1
local GO = `r(drop)' > 0
}
sort _ID, stable

* ----------------------------------------------------------------------------
* 4. End program
* ----------------------------------------------------------------------------

end

* ----------------------------------------------------------------------------
* A1. Subprogram -_simplify-
* ----------------------------------------------------------------------------

program _simplify, rclass
version 9.2
args LIM1 LIM2

tempvar POLYGON SIDE PERIMETER MARK
gen `POLYGON' = (_X == .)
replace `POLYGON' = sum(`POLYGON')
sort `POLYGON', stable
by `POLYGON' : gen `SIDE' = sqrt( (_X - _X[_n-1])^2 + (_Y - _Y[_n-1])^2 )
gen `MARK' = (`SIDE' <= `LIM1')
by `POLYGON' : replace `MARK' = 0 if _n == _N
by `POLYGON' : replace `MARK' = 0 if `MARK'==1 & `MARK'[_n-1]==1 & `MARK'[_n-2]==0
count if `MARK' == 1
local DROP = r(N)
drop if `MARK' == 1
by `POLYGON' : replace `MARK' = 1 if _N <= 4
by `POLYGON' : egen `PERIMETER' = total(`SIDE')
replace `MARK' = 1 if `PERIMETER' <= `LIM2'
count if `MARK' == 1
local DROP = `DROP' + r(N)
drop if `MARK' == 1
return local drop = `DROP'

end

At 00.13 30/08/2008, Austin Nichols wrote:

```Dear Statalisters,

I'm curious if anyone has any good ideas about how to smooth map
polygons in Stata.  Most of the shape files out there on the web seem
to be very high-resolution, perhaps suitable for navigation but less
suitable for conveying rough distributions in a graphic (preferably
with a small file size).  My approach in the past has been to drop
smaller polygons, pick a random subset of points in all remaining
polygons, and then to march across observations to put back in points
that seem to measure large changes in position.  This approach is
illustrated below.

clear all
cap copy http://pped.org/stata/uscoord.dta /uscoord.dta
use /uscoord.dta, clear
*parameters tol and mod determine coarseness of results
local tol 1
local mod 200
g c=sum(mi(_X))
g n=_n
egen count=count(_X), by(_I c)
g nc=-count
*keep only the important polygons
bys _ID (nc n): g p=sum(mi(_X))
keep if (p<2) | (p<3 & _ID==24) | (p<5 & _ID==13)
*get a fraction of points on each polygon
by _ID nc: g y=_Y if mod(_n,int(_N/`mod'))==2 | _ID==1
g x=_X if y<.
sort _ID n
*add back in far-distant points on each polygon
local r=6371/(2.54*5.280*.12)
loc p1 "sin(_X*_pi/180)*sin(_X[_n-1]*_pi/180)+cos(_X*_pi/180)"
loc p2 "*cos(_X[_n-1]*_pi/180)*cos((_Y[_n-1]-_Y)*_pi/180)"
g d=acos(`p1'`p2')*`r'
replace y=_Y if (d>`tol' & d<.)|(d[_n+1]>`tol'& d[_n+1]<.)
replace x=_X if (d>`tol' & d<.)|(d[_n+1]>`tol'& d[_n+1]<.)
drop if mi(y) & !mi(_Y)
*drop all but 50 states and DC
drop if inlist(_ID, 7, 40, 48, 54, 55)
* doctor loc and size of AK, HI, DC
replace y=y/2+10 if _ID==56
replace x=x/3-85 if _ID==56
replace x=x*1.5+102 if _ID==13
replace y=y+10 if _ID==13
replace y=_Y*10-355 if _ID==1
replace x=_X*10+700 if _ID==1
*drop extraneous obs
bys _ID p (n): drop if mi(y)&mi(y[_n-1])&_n>1
*make an obs to connect first and last points
bys _ID p (n): g expand=1+(_n==_N)
expand expand
bys _ID p (n): replace n=n[2]-1/2 if _n==_N
sort _ID n
loc o1 "name(usa, replace) xla(none) yla(none, nogrid) "
loc o2 "xsize(4) ysize(2) graphr(fc(white))xsc(noline) "
loc o3 `"ysc(noline) xti("") yti("") leg(off) "'
sc y x, c(l) ms(none) cmissing(n) `o1' `o2' `o3'
su x, meanonly
di %3.2f r(N)/521238*100 "% of orig data"
replace _X=x
replace _Y=y
keep _ID _X _Y
save /us, replace
use http://pped.org/stata/spop90.dta, clear
cap spmap pop using /us, id(id)
if _rc {
ssc inst spmap
spmap pop using /us, id(id)
}

If anyone has any better ideas, please let me know.

Thanks,
Austin
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```