Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: smaller, smoother maps


From   "Sergiy Radyakin" <[email protected]>
To   [email protected]
Subject   Re: st: smaller, smoother maps
Date   Fri, 29 Aug 2008 19:40:04 -0400

Hi Austin!

My preferred answer would be: use amap which works directly with
shapefiles and works fast enough not to bother about smoothing them. I
had a look at the file you've linked and I had the same problem that
you do with a number of countries that have long coastline, which is
where the large number of points come from. Even after I wrote a
program which calls Stata's [undocumented?] gdi commands it was still
crawling for countries like Indonesia - because of large number of
islands. Now it works in a flash. :)

http://siteresources.worldbank.org/INTPOVRES/Images/MainWindow2.png
http://siteresources.worldbank.org/INTPOVRES/Resources/477227-1184622288835/4002237-1219247341749/ADePT_Maps2.pdf
http://go.worldbank.org/LYW5JRKLP0

If you only prefer spmap/tmap or you are not a Windows user, there is
a website that does exactly what you need: map simplification,
reduction, generalization. The website is called www.mapshaper.org. It
is described, e.g. here:
http://maps.grammata.com/autocarto2006_paper.pdf

On the up side:
1. you can clearly see what's happening to your data
2. you can choose the parameters interactively
3. you can typically drop about 80% of your data without a significant
damage to the way map looks like.

On the down side:
1. you actually need a shapefile to get started. If you have a .dta
file from someone already prepared for spmap/tmap - this will not
work.
2. you need to upload this shapefile to this website - if it is
copyrighted or anything, you might double check if this is allowed. It
does not mean that the file will be automatically exposed there to
anyone, but what do I know...
3. it works almost always, but sometimes it creates strange jitter. I
carefully inspect the resulting shapefile to see if there are any
problems, and if so - do another iteration with different parameters.
It seems that those problems could be shapefile-specific.

After you find a suitable combination of parameters, you can download
the resulting (smaller) shapefile back from this site and convert it
to Stata .dta-format using shp2dta.ado.

If you want to implement simplification yourself, Google
"Douglas-Peucker algorithm". You might want to be more carefull with
removing small polygons. Accidentially removing Kuril islands from the
map of Russia can make lot's of people unhappy.

An important feature of a program that simplifies GIS data is that it
must keep borders seemless - not producing overlap or gaps. Thus you
need to look at the adjacent polygons and add remove points in sync.
Any random drops would probably violate that, unless you restore the
duplicates somehow. I wrote a program to combine polygons in Stata
based on the common points approach some times ago. But it is known
not to work when things get complicated.

Hope this helps.

Regards,
  Sergiy Radyakin





On 8/29/08, Austin Nichols <[email protected]> wrote:
> Dear Statalisters,
>
> I'm curious if anyone has any good ideas about how to smooth map
> polygons in Stata.  Most of the shape files out there on the web seem
> to be very high-resolution, perhaps suitable for navigation but less
> suitable for conveying rough distributions in a graphic (preferably
> with a small file size).  My approach in the past has been to drop
> smaller polygons, pick a random subset of points in all remaining
> polygons, and then to march across observations to put back in points
> that seem to measure large changes in position.  This approach is
> illustrated below.
>
> clear all
> cap copy http://pped.org/stata/uscoord.dta /uscoord.dta
> use /uscoord.dta, clear
> *parameters tol and mod determine coarseness of results
> local tol 1
> local mod 200
> g c=sum(mi(_X))
> g n=_n
> egen count=count(_X), by(_I c)
> g nc=-count
> *keep only the important polygons
> bys _ID (nc n): g p=sum(mi(_X))
> keep if (p<2) | (p<3 & _ID==24) | (p<5 & _ID==13)
> *get a fraction of points on each polygon
> by _ID nc: g y=_Y if mod(_n,int(_N/`mod'))==2 | _ID==1
> g x=_X if y<.
> sort _ID n
> *add back in far-distant points on each polygon
> local r=6371/(2.54*5.280*.12)
> loc p1 "sin(_X*_pi/180)*sin(_X[_n-1]*_pi/180)+cos(_X*_pi/180)"
> loc p2 "*cos(_X[_n-1]*_pi/180)*cos((_Y[_n-1]-_Y)*_pi/180)"
> g d=acos(`p1'`p2')*`r'
> replace y=_Y if (d>`tol' & d<.)|(d[_n+1]>`tol'& d[_n+1]<.)
> replace x=_X if (d>`tol' & d<.)|(d[_n+1]>`tol'& d[_n+1]<.)
> drop if mi(y) & !mi(_Y)
> *drop all but 50 states and DC
> drop if inlist(_ID, 7, 40, 48, 54, 55)
> * doctor loc and size of AK, HI, DC
> replace y=y/2+10 if _ID==56
> replace x=x/3-85 if _ID==56
> replace x=x*1.5+102 if _ID==13
> replace y=y+10 if _ID==13
> replace y=_Y*10-355 if _ID==1
> replace x=_X*10+700 if _ID==1
> *drop extraneous obs
> bys _ID p (n): drop if mi(y)&mi(y[_n-1])&_n>1
> *make an obs to connect first and last points
> bys _ID p (n): g expand=1+(_n==_N)
> expand expand
> bys _ID p (n): replace n=n[2]-1/2 if _n==_N
> sort _ID n
> loc o1 "name(usa, replace) xla(none) yla(none, nogrid) "
> loc o2 "xsize(4) ysize(2) graphr(fc(white))xsc(noline) "
> loc o3 `"ysc(noline) xti("") yti("") leg(off) "'
> sc y x, c(l) ms(none) cmissing(n) `o1' `o2' `o3'
> su x, meanonly
> di %3.2f r(N)/521238*100 "% of orig data"
> replace _X=x
> replace _Y=y
> keep _ID _X _Y
> save /us, replace
> use http://pped.org/stata/spop90.dta, clear
> cap spmap pop using /us, id(id)
> if _rc {
>  ssc inst spmap
>  spmap pop using /us, id(id)
> }
>
> If anyone has any better ideas, please let me know.
>
> Thanks,
> Austin
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index