Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Boxplot + line


From   n j cox <n.j.cox@durham.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Boxplot + line
Date   Wed, 26 Sep 2007 11:16:23 +0100

Stata's version of the _categorical imperative_, although
not as catchy as that formulated by Immanuel Kant, applies
here. Otherwise known as Wiggins' Third Law, it states

"If the going gets tough with a categorical graph,
start [all] over [again] with -twoway-."

A categorical graph here means -graph box-, -graph hbar-, -graph bar-
or -graph dot-. I will not mention -graph pie-.

Otherwise put, Allan is right. -graph box- won't let you use -addplot()-
(-plot()- in Stata 8). -addplot()- superimposes one or more -twoway- graphs on top of another. As -graph box- is not a -twoway- type,
-addplot()- is out of the question.

Moreover, although -dotplot- allows a crude representation of boxes,
it too does not allow -addplot()-. I guess that falls under the heading
of "Will anyone want that? Probably not."

No matter. There are several alternatives.
Allan can get arbitrarily close to what (he thinks) he wants.
In fact, he can get what may well appear better graphs than
what he is asking for.

A first possibility is to install -stripplot- from SSC.
As mentioned a while back on this list, -stripplot-
has a -box- option. In fact, it also has a -box()-
option with arguments for tuning the box.

After

sysuse auto, clear

you can get this

stripplot displacement , over(rep78) vertical
box(bfcolor(gs14) barw(0.2)) addplot(lfit displacement rep78, lcolor(black))

That is a scatter plot with boxes showing median
and quartiles with a regression line superimposed.

The result is no one's conventional box plot. It is a dot-box
plot (references in the help file, and more welcome). The idea
goes back as least as far as a suggestion of Jerry Dallal to Leland
Wilkinson.

The programmer of -stripplot- told me, in confidence, that
he does not like whiskers, so -stripplot- does not support whiskers.
That's also explicit in the help. In fact, he doesn't much like
box plots, which he asserts have become too widely used. Box plots
are often used for comparisons of only a few categories, when
usually a lot more detail could be shown helpfully, and without
confusing the reader.

But you can subvert that prejudice. You just need a little more
work.

First off, the quartiles are easy to get:

egen upq = pctile(displacement), by(rep78) p(75)
egen loq = pctile(displacement), by(rep78) p(25)

Next, what Tukey called the _adjacent values_, the ends of
the whiskers, are in fact quite easy to too, but you
must install -egenmore- from SSC first,

egen uadj = adju(displacement), by(rep78)
egen ladj = adjl(displacement), by(rep78)

Now we can do this:

stripplot displacement , over(rep78) vertical ///
box(bfcolor(gs14) barw(0.2)) ms(none) ///
addplot( ///
scatter disp rep78 if disp < ladj | disp > uadj, mcolor(black) || ///
rspike upq uadj rep78, lcolor(black) || ///
rspike loq ladj rep78, lcolor(black) || ///
lfit displacement rep78 , lcolor(red) ///
)

In English, not Stata

1. We draw a -stripplot- with a box and suppress all the marker
symbols.
2. On that we superimpose a scatter plot of the points beyond
the adjacent values.
3. On that we superimpose spikes connecting the quartiles and
the adjacent values.
4. On that we superimpose a regression line.

If you really want, you can cap the spikes too:

stripplot displacement , over(rep78) vertical ///
box(bfcolor(gs14) barw(0.2)) mcolor(black) ms(none) ///
addplot( ///
scatter disp rep78 if disp < ladj | disp > uadj, mcolor(black) || ///
rspike upq uadj rep78, lcolor(black) || ///
rspike loq ladj rep78, lcolor(black) || ///
rcap uadj uadj rep78 if uadj != upq, lcolor(black) || ///
rcap ladj ladj rep78 if ladj != loq, lcolor(black) || ///
lfit displacement rep78 , lcolor(red) ///
)

Note that you don't really need -stripplot-. You have the
quartiles and could draw those directly with -twoway rbar-.
But I started out with -stripplot- and kept playing.

Of course, Allan should get someone to pay for Stata 10. The graph
editor alone will give hours of endless amusement. As it happens,
all this is possible in Stata 8.2.

Nick
n.j.cox@durham.ac.uk

Allan Reese
--------------------------------------------------------------------------------

The manual notes "box charts are implicitly categorical" but your over variable may equally have ordered or true quantitative values. Given a boxplot with categories 1,2,3,... , is there a way to add a regression line over the boxes? addline offers only vertical and horizontal lines; addplot isn't available; and "|| scatteri" isn't allowed.

Any ideas please (still on v9, but this may be the spur to move it on up).

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index