Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Weighted scatterplot with -mlabel-

From   "Nick Cox" <>
To   <>
Subject   st: RE: Weighted scatterplot with -mlabel-
Date   Thu, 8 Jun 2006 13:55:39 +0100

This has been answered, but a broader issue is how far 
such plots can be taken seriously. I don't have any
easy answers, just a few awkward questions. 

Psychologically and socially, the question arises of 
what the graph designer thinks is being shown, what
the designer tells the graph reader and what the 
reader thinks is being shown. 

Assume for the sake of argument that the symbol chosen
is a circle. Similar issues arise with any other choice 
of symbol. 

In principle there are two simple choices, scaling 
each circle so that its radius (or equivalently, 
its diameter) is proportional to
some other variable, and scaling each circle so
that its area is proportional. Evidently results
will differ. This is a matter of simple dimensional
analysis and nothing to do with any choice of multiplicative

Given this choice, my guess is that most readers
of this list will assert that choice of areas is 

However, it is not difficult to find examples 
in which there is sufficient variation in the weight 
variable that the resulting graph would be regarded 
as absurd or at least unattractive on other grounds: 
the largest symbols would be very large, 
or the smallest symbols would be difficult or impossible
to distinguish, or both. 

As a matter of fact, Stata does not use any such naive 
scaling. It uses a different algorithm, although
the current documentation (version 9 [G] p.293) 
is uninformative about precisely what that is. 
(Apologies if I have missed more detail elsewhere. 
It is my recollection that previous manual versions
were not so coy or so cryptic, but I don't know whether
those previous versions apply to what is done in Stata 8/9

To muddy the issue further, psychological experiments 
in which people were asked to infer magnitudes from
proportional symbols indicate that people perceive
not area, but area to some power like 0.7 or 0.8, except
that there is, unsurprisingly, a lot of variability too. 
Crudely, on average people "see" magnitudes not according
to areas or radii, but somewhere in between. 

So where are we? What people "see" is beyond your control --
although it should have bearing on your choices -- 
but what you present in a paper or a talk is your choice, 
modulo some boss, manager, supervisor, advisor, or committee 
looming behind you. 

1. An informed Stata user might say, "The sizes of the 
symbols indicate the size of a third variable, but 
the scale should be interpreted as a monotonic scale, 
not literally." Bigger means bigger, in other words? 
This is correct, but might well be rejected as too
vague by a careful critic. 

2. A less informed Stata user might say, 
"The sizes of the symbols are proportional to the size 
of a third variable." But they would be wrong! And
if obliged to specify, radius or area, they would be 

I don't find either of these situations very satisfying. 
This is not a critique of StataCorp, as I suspect that
they are subject to market pressure on this point. ("What,
no scope for proportional symbols? Sounds like a lousy
graphics provision to me!")

Naturally, there might be another view: 

0. Don't be so pedantic. Graphs like these are essentially
indicative or exploratory. No analyst in their right mind
would come to conclusions or decisions on such graphs alone. 

Also, I have no easy alternatives. 

3. If a third variable is important, then one possibility
is that we need some kind of 3-D graphics to show it. But
with user-written exceptions this is still on the agenda. 

4. Occasionally, just showing the magnitude as a number 
marker label can be helpful. Clearly, this is not a general

5. Other kinds of plots in which you have _exact_ control
over the size of the "symbol" shown are also possible. 
See -diplot- from SSC. 

Other views? 


Richard Upward
> I am trying to produce a weighted scatterplot which includes 
> labels for each point.  But when I use the -msymbol- option, 
> the weighting seems to be removed.  A silly example:
> sysuse auto
> scatter price mpg [fweight=weight]
> scatter price mpg [fweight=weight], mlabel(make)
> The first scatter command weights the points, the second does 
> not.  Is this behaviour a "bug" or a "feature", or am I 
> missing something obvious?
> I am using Stata/SE 9.2 for Windows
> Born 17 May 2006

*   For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index