Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: visualization?

From   Nick Cox <>
Subject   Re: st: visualization?
Date   Fri, 30 Sep 2011 15:23:22 +0100

Do you mean Vince Viggins? Sounds like a Dickens character. We saw
Vince Wiggins at the London meeting.

We are starting with these suggestions. I'll add numbers for convenience.

1. If one of the variables is positively skewed, consider plotting
that axis on a log scale.

2. If there are a lot of data points (e.g., n > 1000), adopt a
different strategy such as using some form of partial transparency, or
sampling the data;

3. If one of the variables takes on a limited number of discrete
categories, consider using a jitter or a sunflower plot;

4. If there are three or more variables, consider using a scatterplot matrix;

5. Fitting some form of trend line is often useful;

6. Adjust the size of the plotting character to the sample size (for
bigger n, use a smaller plotting character);

Random comments

1. I take this as standard. I'll add a plea for consideration of any
reasonable non-linear scale, labelled in the original units!

2 and 6. Transparency is on some wishlists for Stata. With lots of
data, you go not only for smaller symbols but more open ones and use
lighter colors.

3. I've played with sunflower plots and gone off them. But if you want
to try them, note that they are undocumented [sic] at -help twoway
sunflower-. For highly discrete or even categorical variables, I like
my -tabplot- (SSC).

4. Agree, although that does not rule some projection from a
multivariate analysis being helpful too.

5. Yes, if "trend" means "smooth". Some special smooths were published in

SJ-10-1 gr0021_1  . . . . . . . . . .  Software update for doublesm and diagsm
        (help doublesm, diagsm, polarsm if installed) . . . . . . .  N. J. Cox
        Q1/10   SJ 10(1):164
        option to carry out smoothing using restricted cubic splines
        added to doublesm and diagsm

SJ-5-4  gr0021  . . . . . . .  Speaking Stata: Smoothing in various directions
        (help doublesm, diagsm, polarsm if installed) . . . . . . .  N. J. Cox
        Q4/05   SJ 5(4):574--593
        discusses exploratory tools for determining the structure
        of bivariate data

Some possible additions:

7. About 1980, there was a sudden fashion for adding convex hulls,
which faded away quickly. I remember often doing it with a pencil on
lineprinter output. But Allan Reese has a nice implementation on SSC
as -cvxhull-. On occasion that helps a lot.

8. When you have a categorical subdivision, try out both several
categories superimposed and a -by()- option to give separate plots. A
third strategy is given in

SJ-10-4 gr0046  . . . . . . . . . . . . . . . Speaking Stata: Graphing subsets
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q4/10   SJ 10(4):670--681                                (no commands)
        explores graphical comparison of results for two or more
        subsets where each subset is plotted in a separate panel,
        with the rest of the data as a backdrop


On Fri, Sep 30, 2011 at 2:56 PM, Stas Kolenikov <> wrote:

> There was an interesting question on data visualization on
> Stats.StackExchange (
> what are the efficient strategies for tweaking scatterplots depending
> on the data needs? Too much data make it clogged, too little data such
> as ordinal make it too chunky, too skewed data makes it sit in one
> corner, and there are a multitude of other things that needs to be
> adjusted to make the display really informative.
> I would be especially curious to hear from Nick Cox and Michael
> Mitchell, I guess, as the greatest contributors to Stata graphics (and
> of course Vince V, but I don't think I've seen him on the list for a
> while).
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index