**[G-2] graph matrix** -- Matrix graphs

__Syntax__

__gr__**aph** **matrix** *varlist* [*if*] [*in*] [*weight*] [**,** *options*]

*options* Description
-------------------------------------------------------------------------
**half** draw lower triangle only

*marker_options* look of markers
*marker_label_options* include labels on markers
**jitter(***relativesize***)** perturb location of markers
**jitterseed(***#***)** random-number seed for **jitter()**

__diag__**onal(***stringlist***,** ...**)** override text on diagonal
__diagopt__**s(***textbox_options***)** rendition of text on diagonal

**scale(***#***)** overall size of symbols, labels, etc.
**iscale(**[*****]*#***)** size of symbols, labels, within plots

__max__**es(***axis_scale_options*
*axis_label_options***)** labels, ticks, grids, log scales, etc.
*axis_label_options* axis-by-axis control

**by(***varlist***, ...)** repeat for subgroups

*std_options* titles, aspect ratio, saving to disk
-------------------------------------------------------------------------
All options allowed by **graph** **twoway** **scatter** are also allowed, but they
are ignored.
**half**, **diagonal()**, **scale()**, and **iscale()** are *unique*; **jitter()** and
**jitterseed()** are *rightmost* and **maxes()** is *merged-implicit*; see
repeated options.

*stringlist***,** ..., the argument allowed by **diagonal()**, is defined

[{**.**|**"***string***"**}] [ {**.**|**"***string***"**} ... ] [**,** *textbox_options*]

**aweight**s, **fweight**s, and **pweight**s are allowed; see weight. Weights affect
the size of the markers. See *Weighted markers* in **[G-2] graph twoway**
**scatter**.

__Menu__

**Graphics > Scatterplot matrix**

__Description__

**graph** **matrix** draws scatterplot matrices.

__Options__

**half** specifies that only the lower triangle of the scatterplot matrix be
drawn.

*marker_options* specify the look of the markers used to designate the
location of the points. The important *marker_options* are **msymbol()**,
**mcolor()**, and **msize()**.

The default symbol used is **msymbol(O)** -- solid circles. You specify
**msymbol(Oh)** if you want hollow circles (a recommended alternative).
If you have many observations, we recommend specifying **msymbol(p)**;
see *Marker symbols and the number of observations* under *Remarks*
below. See **[G-4]** *symbolstyle* for a list of marker symbol choices.

The default **mcolor()** is dictated by the scheme; see **[G-4] schemes**
**intro**. See **[G-4]** *colorstyle* for a list of color choices.

Be careful specifying the **msize()** option. In **graph** **matrix**, the size
of the markers varies with the number of variables specified; see
option **iscale()** below. If you specify **msize()**, that will override
the automatic scaling.

See **[G-3]** *marker_options* for more information on markers.

*marker_label_options* allow placing identifying labels on the points. To
obtain this, you specify the *marker_label_option* **mlabel(***varname***)**; see
**[G-3]** *marker_label_options*. These options are of little use for
scatterplot matrices because they make the graph seem too crowded.

**jitter(***relativesize***)** adds spherical random noise to the data before
plotting. This is useful when plotting data that otherwise would
result in points plotted on top of each other. See *Jittered markers*
in **[G-2] graph twoway scatter** for an explanation of jittering.

**jitterseed(***#***)** specifies the seed for the random noise added by the
**jitter()** option. *#* should be specified as a positive integer. Use
this option to reproduce the same plotted points when the **jitter()**
option is specified.

**diagonal(**[*stringlist*][**,** *textbox_options*]**)** specifies text and its style to
be displayed along the diagonal. This text serves to label the
graphs (axes). By default, what appears along the diagonals are the
variable labels of the variables of *varlist* or, if a variable has no
variable label, its name. Typing

**. graph matrix mpg weight displ, diag(. "Weight of car")**

would change the text appearing in the cell corresponding to variable
**weight**. We specified period (**.**) to leave the text in the first cell
unchanged, and we did not bother to type a third string or a period,
so we left the third element unchanged, too.

You may specify *textbox_options* following *stringlist* (which may
itself be omitted) and a comma. These options will modify the style
in which the text is presented but are of little use here. We
recommend that you do not specify **diagonal(,size())** to override the
default sizing of the text. By default, the size of text varies with
the number of variables specified; see option **iscale()** below.
Specifying **diagonal(,size())** will override the automatic size
scaling. See **[G-3]** *textbox_options* for more information on
textboxes.

**diagopts(***textbox_options***)** specify the look of text on the diagonal. This
option is a shortcut for **diagonal(, ***textbox_options***)**.

**scale(***#***)** specifies a multiplier that affects the size of all text and
markers in a graph. **scale(1)** is the default, and **scale(1.2)** would
make all text and markers 20% larger. See **[G-3]** *scale_option*.

**iscale(***#***)** and **iscale(****#***)** specify an adjustment (multiplier) to be used to
scale the markers, the text appearing along the diagonals, and the
labels and ticks appearing on the axes.

By default, **iscale()** gets smaller and smaller the larger *n* is, the
number of variables specified in *varlist*. The default is
parameterized as a multiplier f(*n*) -- 0<f(*n*)<1, f'(*n*)<0 -- that is
used as a multiplier for **msize()**, **diagonal(,size())**,
**maxes(labsize())**, and **maxes(tlength())**.

If you specify **iscale(***#***)**, the number you specify is substituted for
f(*n*). We recommend that you specify a number between 0 and 1, but
you are free to specify numbers larger than 1.

If you specify **iscale(****#***)**, the number you specify is multiplied by
f(*n*), and that product is used to scale text. Here you should
specify *#*>0; *#*>1 merely means you want the text to be bigger than
**graph** **matrix** would otherwise choose.

**maxes(***axis_scale_options axis_label_options***)** affect the scaling and look
of the axes. This is a case where you specify options within
options.

Consider the *axis_scale_options* {**y**|**x**}**scale(log)**, which produces
logarithmic scales. Type **maxes(yscale(log)** **xscale(log))** to draw the
scatterplot matrix by using log scales. Remember to specify both
**xscale(log)** and **yscale(log)**, unless you really want just the *y* axis
or just the *x* axis logged.

Or consider the *axis_label_options* {**y**|**x**}**label(,grid)**, which adds grid
lines. Specify **maxes(ylabel(,grid))** to add grid lines across,
**maxes(xlabel(,grid))** to add grid lines vertically, and both options
to add grid lines in both directions. When using both, you can
specify the **maxes()** option twice -- **maxes(ylabel(,grid))**
**maxes(xlabel(,grid))** -- or once combined -- **maxes(ylabel(,grid)**
**xlabel(,grid))** -- it makes no difference because **maxes()** is
*merged-implicit*; see repeated options.

See **[G-3]** *axis_scale_options* and **[G-3]** *axis_label_options* for the
suboptions that may appear inside **maxes()**. In reading those entries,
ignore the **axis(***#***)** suboption; **graph** **matrix** will ignore it if you
specify it.

*axis_label_options* allow you to assert axis-by-axis control over the
labeling. Do not confuse this with **maxes(***axis_label_options***)**, which
specifies options that affect all the axes. *axis_label_options*
specified outside the **maxes()** option specify options that affect just
one of the axes. *axis_label_options* can be repeated for each axis.

When you specify *axis_label_options* outside **maxes()**, you must specify
the axis-label suboption **axis(***#***)**. For instance, you might type

**. graph matrix mpg weight displ, ylabel(0(5)40, axis(1))**

The effect of that would be to label the specified values on the
first *y* axis (the one appearing on the far right). The axes are
numbered as follows:

*x* *x*
**axis(2)** **axis(4)**
+---------------------------------------+
| | v1/v2 | v1/v3 | v1/v4 | v1/v5 | *y* **axis(1)**
|-------+-------+-------+-------+-------|
*y* **axis(2)** | v2/v1 | | v2/v3 | v2/v4 | v2/v5 |
|-------+-------+-------+-------+-------|
| v3/v1 | v3/v2 | | v3/v4 | v3/v5 | *y* **axis(3)**
|-------+-------+-------+-------+-------|
*y* **axis(4)** | v4/v1 | v4/v2 | v4/v3 | | v4/v5 |
|-------+-------+-------+-------+-------|
| v5/v1 | v5/v2 | v5/v3 | v5/v4 | | *y* **axis(5)**
+---------------------------------------+
*x x x*
**axis(1) axis(3) axis(5)**

and if **half** is specified, the numbering scheme is

+-------+
| |
|-------+-------+
*y* **axis(2)** | v2/v1 | |
|-------+-------+-------+
*y* **axis(3)** | v3/v1 | v3/v2 | |
|-------+-------+-------+-------+
*y* **axis(4)** | v4/v1 | v4/v2 | v4/v3 | |
|-------+-------+-------+-------+-------+
*y* **axis(5)** | v5/v1 | v5/v2 | v5/v3 | v5/v4 | |
+---------------------------------------+
*x x x x x*
**axis(1) axis(2) axis(3) axis(4) axis(5)**

See **[G-3]** *axis_label_options*; remember to specify the **axis(***#***)**
suboption, and do not specify the **graph matrix** option **maxes()**.

**by(***varlist***,** ...**)** allows drawing multiple graphs for each subgroup of the
data. See *Use with by()* under *Remarks* below, and see **[G-3]**
*by_option*.

*std_options* allow you to specify titles (see *Adding titles* under *Remarks*
below, and see **[G-3]** *title_options*), control the aspect ratio and
background shading (see **[G-3]** *region_options*), control the overall
look of the graph (see **[G-3]** *scheme_option*), and save the graph to
disk (see **[G-3]** *saving_option*).

See **[G-3]** *std_options* for an overview of the standard options.

__Remarks__

Remarks are presented under the following headings:

Typical use
Marker symbols and the number of observations
Controlling the axes labeling
Adding grid lines
Adding titles
Use with by()
History

__Typical use__

**graph** **matrix** provides an excellent alternative to correlation matrices
(see **[R] correlate**) as a quick way to examine the relationships among
variables:

**. sysuse lifeexp**

**. graph matrix popgrowth-safewater**
*(**click to run**)*

Seeing the above graph, we are tempted to transform **gnppc** into log units:

**. generate lgnppc = ln(gnppc)**

**. graph matrix popgr lexp lgnp safe**
*(**click to run**)*

Some people prefer showing just half the matrix, moving the "dependent"
variable to the end of the list:

**. graph matrix popgr lgnp safe lexp, half**
*(**click to run**)*

__Marker symbols and the number of observations__

The **msymbol()** option -- abbreviation **ms()** -- allows us to control the
marker symbol used; see **[G-3]** *marker_options*. Hollow symbols sometimes
work better as the number of observations increases:

**. sysuse auto, clear**

**. graph mat mpg price weight length, ms(Oh)**
*(**click to run**)*

Points work best when there are many data:

**. sysuse citytemp, clear**

**. graph mat heatdd-tempjuly, ms(p)**
*(**click to run**)*

__Controlling the axes labeling__

By default, approximately three values are labeled and ticked on the *y*
and *x* axes. When graphing only a few variables, increasing this often
works well:

**. sysuse citytemp, clear**

**. graph mat heatdd-tempjuly, ms(p) maxes(ylab(#4) xlab(#4))**
*(**click to run**)*

Specifying **#4** does not guarantee four labels; it specifies that
approximately four labels be used; see **[G-3]** *axis_label_options*. Also
see *axis_label_options* under *Options* above for instructions on
controlling the axes individually.

__Adding grid lines__

To add horizontal grid lines, specify **maxes(ylab(,grid))**, and to add
vertical grid lines, specify **maxes(xlab(,grid))**. Below we do both and
specify that four values be labeled:

**. sysuse lifeexp, clear**

**. generate lgnppc = ln(gnppc)**

**. graph matrix popgr lexp lgnp safe, maxes(ylab(#4, grid) xlab(#4,**
**grid))**
*(**click to run**)*

__Adding titles__

The standard title options may be used with **graph** **matrix**:

**. sysuse lifeexp, clear**

**. generate lgnppc = ln(gnppc)**

**. label var lgnppc "ln GNP per capita"**

**. graph matrix popgr lexp lgnp safe, maxes(ylab(#4, grid) xlab(#4, grid**
**> ))**
** subtitle("Summary of 1998 life-expectancy data")**
** note("Source: The World Bank Group")**
*(**click to run**)*

__Use with by()__

**graph** **matrix** may be used with **by()**:

**. sysuse auto, clear**

**. graph matrix mpg weight displ, by(foreign)**
*(**click to run**)*

See **[G-3]** *by_option*.

__History__

The origin of the scatterplot matrix is unknown, although early written
discussions may be found in Hartigan (1975), Tukey and Tukey (1981), and
Chambers et al. (1983). The scatterplot matrix has also been called the
*draftman's display* and *pairwise scatterplot*. Regardless of the name
used, we believe that the first "canned" implementation was by Becker and
Chambers in a system called S -- see Becker and Chambers (1984) --
although S predates 1984. We also believe that Stata provided the second
implementation, in 1985.

__References__

Becker, R. A., and J. M. Chambers. 1984. *S: An Interactive Environment*
*for Data Analysis and Graphics*. Belmont, CA: Wadsworth.

Chambers, J. M., W. S. Cleveland, B. Kleiner, and P. A. Tukey. 1983.
*Graphical Methods for Data Analysis*. Belmont, CA: Wadsworth.

Hartigan, J. A. 1975. Printer graphics for clustering. *Journal of*
*Statistical Computation and Simulation* 4: 187-213.

Tukey, P. A., and J. W. Tukey. 1981. Preparation; prechosen sequences of
views. In *Interpreting Multivariate Data*, ed. V. Barnett, 189-213.
Chichester, UK: Wiley.