Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Scatterplot with weighted markers


From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   st: RE: Scatterplot with weighted markers
Date   Tue, 15 Feb 2011 18:42:02 +0000

Thanks to Allan for digging up my old post. 

I don't want to add to it, except to underline that Stata lets you play not only with this recipe, but with others. -tabplot- from SSC already lets you show vertical or horizontal bars with specified heights (lengths) and you can put them at specified x, y coordinates. 

The following example shows the possibilities more directly: 

sysuse auto, clear

gen mpg1 = mpg - price/7000
gen mpg2 = mpg + price/7000

scatter mpg weight, ms(oh) || rbar mpg1 mpg2 weight, /// 
ytitle("`: var label mpg'") bfcolor(none) barw(100) legend(off)

The key points are all very simple:

1. If you show bars, you can control the scaling directly. You know, and can tell your readers, that it is linear (or whatever else you choose). 

2. If you use constant widths, the user only has to interpret variations in height and there is no dimensional ambiguity. 

3. Naturally, bars can be transparent to allow overlap. 

Of course, this method has one big disadvantage too, that it arbitrarily uses only one dimension. But it includes genuinely proportional symbols. 

Nick 
[email protected] 

Allan Reese

Nick Cox questioned some time back -
http://www.stata.com/statalist/archive/2006-06/msg00291.html - whether
this feature is sensible.  As he pointed out, the interpretation of
symbol "size" depends on the individual viewer. (For fans of Father Ted,
"This is *small* but that is *far away*.")

I agree that the perception will be individual and impressionistic, but
it can be used in just that way.  Hence it becomes a design feature and
the person designing the graph can select the scaling in just the same
way you choose the right (best, most-fitting, perfect) word to give the
preferred degree of emphasis to a statement.

The reason for writing is I've been experimenting with weighting symbol
size by functions of the sample size for each point.  I saw such a plot
in a paper and thought it gave a very useful impression of what
confidence you might have in the fitted model (line).

This is very easy to experiment with.  Using the classic data as an
example (but price rather than n)
. use auto
. scatter weight length [w=price], ms(oh)
. scatter weight length [w=sqrt(price)], ms(oh)
. scatter weight length [w=log10(price)], ms(oh)
. summ price

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       price |        74    6165.257    2949.496       3291      15906

Logic says sqrt is "proportional".  Economics says log is "utility", and
maybe sqrt(log10(price)) is the visual logic.  On the other hand, the
ratio biggest:smallest is 5:1, and simple weighting gives the clearest
visual feel for "more expensive".

You can expand the ratio by subtracting an offset that greatly increases
the contrast, though the smallest values then disappear as points.
. scatter weight length [w=(price-3000)], ms(oh)

While you can choose an appropriate ratio between smallest and largest
symbols, there is currently no way to scale all the symbols so the
largest do not overlap or to make the smallest more visible.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index