|
Note: This FAQ is relevant for users of releases prior to
Stata 10.
How do I graph data onto a map with tmap?
| Title |
|
Working with tmap and maps |
| Author |
Kevin Crow and William Gould, StataCorp |
| Date |
April 2006; minor revisions May 2007 |
Introduction
With tmap, you can graph data onto maps and produce results such as
these.
tmap is a user-written command by Maurizio Pisati. This FAQ explains
how to use tmap. The process is as follows:
- Obtain and install the tmap, shp2dta, and mif2dta
commands.
- Search the web for the files that describe the map onto which you
want to graph your data. You can use ESRI shapefiles or MapInfo
Interchange Format.
ESRI shapefiles are the more common. In this format, there are three
files associated with a map: an .shp shape file, a .dbf
dBASE file, and an .shx index file. You only need the
.shp and .dbf files. You translate those files into a
format usable by Stata with the shp2dta command. Doing so
creates two .dta datasets, one corresponding to each file.
The MapInfo Interchange Format consists of two files with suffixes
.mif and .mid. You translate these files with the
mif2dta command. Just as with shp2dta, doing so creates
two .dta datasets.
- Look at the translated .dbf (.mid) file.
It is a .dta dataset and you just use it. Examine the
dataset to determine the coding used by the map's authors to designate
areas. For instance, 1 might mean Alaska and 2 Alabama in one
dataset, and 1 might mean Albania and 2 Argentina in another.
- You have data you want to plot onto a map. Let’s assume that the
data are stored in a Stata .dta dataset. You need to modify
your .dta dataset to use the same location coding as that used
in the map. Call that variable id.
- Merge on id the translated .dbf (.mid) dataset
with your dataset containing the statistics to be graphed.
- With the merged dataset in memory, make the graph by using
tmap. You will tell tmap about the other translated
dataset (the coordinate dataset) by using an option.
Step 1: Obtain and install the tmap, shp2dta, and mif2dta commands
Type
. ssc install tmap
. ssc install shp2dta
. ssc install mif2dta
You need to perform this step only once.
Step 2: Find a map (an ESRI shapefile or a MapInfo Interchange Format
file)
A map records the geometry and attribute information of spatial features.
Those maps are available from public and private sources. You can use maps
recorded in either of two formats:
-
ESRI shapefiles. This format was developed by the
Environmental Systems
Research Institute, Inc. Shapefiles come in several types.
You will want a "polygon shapefile", the
form suitable for maps. The map is stored in three files:
-
.shp, the coordinates;
.shx, an index; and
.dbf, the codings.
-
MapInfo Interchange Format. This format was developed for use with the
MapInfo software.
The map is stored in two files:
-
.mif, the coordinates; and
.mid, the codings.
It is usually easier to find ESRI shapefiles than MapInfo Interchange Format
files, but you may use either.
Say you want to find a map of the United States. Using a search engine such
as Google or
Yahoo!, search for "United
States shapefile". One result is
described as "This dataset is a polygon shapefile containing the
states and territories of the United States ...". We found
http://www.nws.noaa.gov/geodata/catalog/national/html/us_state.htm and
clicked "Download Compressed Shapefile". We unzipped
s_14jl05.zip, which contained the following files:
s_14jl05.shp
s_14jl05.shx
s_14jl05.dbf
|
These are the filenames as of May 2007. They will most likely change over time.
|
We need only two of the files, s_14jl05.shp and s_14jl05.dbf.
Had we searched for a MapInfo map, there would have been only two files, and
they probably would have been called s_14jl05.mif and
s_14jl05.mid.
Step 3: Translate the files
With the files we just extracted in the current directory, in Stata, we
type,
. shp2dta using s_14jl05, database(usdb) coordinates(uscoord) genid(id)
Pay attention to the three options we specified:
- database(usdb) specified that we wanted the database file to be
named usdb.dta.
- coordinates(uscoord) specified that we wanted the coordinate
file to be named uscoord.dta.
- genid(id) specified that we wanted the ID variable created in
usdb.dta to be named id.
shp2dta can take several minutes to run, depending on the map's size
and level of detail. The U.S. map, however, took only a few seconds.
We would have translated MapInfo files the same way, but we would have used
the command mif2dta instead of shp2dta.
In any case, the translation has created two new .dta datasets:
usdb.dta and uscoord.dta.
Step 4: Determine the coding used by the map
To determine the coding used by the map's authors, type
. use usdb, clear
. describe
Contains data from usdb.dta
obs: 56
vars: 6 29 Mar 2006 11:52
size: 2,744 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
STATE str2 %9s
NAME str24 %24s
FIPS str2 %9s
LON double %10.0g
LAT double %10.0g
id byte %9.0g
-------------------------------------------------------------------------------
Sorted by: id
. list id NAME in 1/5
+-------------------------------+
| id NAME |
|-------------------------------|
1. | 1 District of Columbia |
2. | 2 Arizona |
3. | 3 Ohio |
4. | 4 California |
5. | 5 Alabama |
+-------------------------------+
Let’s shift away for a minute from the details of this map and talk
about the graph we want to draw. We want to graph population by state, and
we have a dataset named stats.dta
containing population figures. In our dataset, we have states recorded
using a different coding, and the identification variable is called
scode.
We must modify our dataset to use the same coding as the map, and the
variable containing the codes must be named id.
To achieve our goal, we made an intermediate dataset called
trans.dta that contained two variables,
scode and id. Each observation records equivalent codes.
When we created trans.dta, we happened to look more carefully at
usdb.dta. We discovered that the map dataset contained information
about not only U.S. states, but also territories. We will just ignore that
extra information. Our trans.dta dataset records only the 51
observations we care about, one for each state plus Washington, D.C.
Then we merged our stats.dta with trans.dta based on
scode:
. use stats
. merge scode using trans, sort unique
To ensure that there were no errors, we checked that all
observations matched (_merge==3) and then dropped the _merge
variable:
. tabulate _merge
(output omitted)
. drop _merge
Step 5: Merge datasets
We now must merge stats.dta with usdb.dta from the map, and
this merge is based on the id variable:
. merge id using usdb, sort unique
Because our map includes locations not included in our original data,
namely, territories as well as states, there will be observations in
usdb.dta that are not also in stats.dta. We should check our
merge:
. tabulate _merge
Here we expect all _merge values to be 2 and 3. If our map
did not include territories, or if our original data did, we would expect
all _merge values to be 3.
Finally, drop the unnecessary observations:
. drop if _merge!=3
Step 6: Draw the graph
To draw the graph, type
. tmap choropleth pop1990, id(id) map(uscoord.dta) palette(Blues)
We will soon deal with Alaska and Hawaii and the effect they have on our
graph. Right now, focus on what we typed:
. tmap choropleth pop1990, id(id) map(uscoord.dta) palette(Blues)
Choropleth is not the name of a variable in our dataset; it is the kind of
graph we want to draw. In a choropleth graph, different areas have
different colors. tmap can draw other kinds of
graphs, too.
Let’s go over the options we specified:
- id(id). Specifying this is not optional. This is the name of
the variable that identifies the locations. Earlier we told you to
name the identification variable id. If you named it something
else, you would specify the name you used inside the parentheses.
- map(uscoord.dta). Specifying this is not optional, either.
tmap needs to know the name of the coordinates dataset.
Remember, we used shp2dta to create two datasets—a
database, which we merged with our statistical data, and a coordinate
dataset, which we named uscoord.dta.
- palette(Blues). This is the one option we specify that really
is optional. The Blues palette uses shades of blue for the colors.
In the command
. tmap choropleth pop1990, id(id) map(uscoord.dta) palette(Blues)
we specified variable pop1990, and in the dataset, that variable
contains the population. The units do not matter; the data could just as
well be coded in millions and we would have obtained the same graph,
although the legend would change.
By default, tmap choropleth divides the specified variable into four
groups that are based on quartiles. You can change the number of groups by
using option clnumber(#), where # can be between
2 and 9.
We will stick with four groups. However, we want to exclude Alaska and
Hawaii from our graph. To do that, type
. tmap choropleth pop1990 if id!=13 & id!=56, id(id) map(uscoord.dta) palette(Blues)
or
. tmap choropleth pop1990 if NAME!="Alaska" & NAME!="Hawaii", id(id) map(uscoord.dta) palette(Blues)
because 56 and 13 are the id codes for Alaska and Hawaii, and because
our dataset happens to contain variable NAME, which records the name
in string form, we obtain this graph:
Look closely at the legend and you will see that the population ranges are
displayed in scientific notation. You can change the display format with
option legformat(format). You might specify
legformat(%20.0f). Or you can change the units of the variable. We
will change population to be recorded in millions:
. replace pop1990 = pop1990/1e+6
The legend is also too small. You can make the legend bigger with option
legsize(#), where # specifies a text-size
multiplier, such as 2. Our improved graph is shown below:
. tmap choropleth pop1990 if id!=13 & id!=56, id(id) map(uscoord.dta) palette(Blues) legsize(2)
tmap has many other options. Read about them in the online help
file (type help tmap) or in the original article by Maurizio Pisati (2004).
Friedrich Huebler&rquo;s blog, at
http://huebler.blogspot.com, occasionally discusses tmap.
Reference
- Pisati, M. 2004. Simple thematic mapping.
- Stata Journal
4: 361–378.
|