Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: graph box - missing outside values


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: graph box - missing outside values
Date   Tue, 12 Aug 2003 09:09:02 +0100

John Plummer

> I am using boxplots (graph box x1 x2 x3..., Stata version 
> 8.1, Win 98) to
> plot several variables. Some plots seem not to show all the 
> data.  For
> example, for the following variable x:
> 
> . tab x
> 
>           x |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>           1 |        125       78.13       78.13
>           2 |         20       12.50       90.63
>           3 |         15        9.38      100.00
> ------------+-----------------------------------
>       Total |        160      100.00
> 
> the command "graph box x" shows only the median line at 1, 
> with no whiskers
> or outside values to indicate the data points at 2 or 3.
> 
> Can anyone suggest how I might get boxplots showing the 
> full range of the data?

With these data, the upper and lower quartiles 
are both 1, and so the so-called step, i.e. 
1.5 * (upper quartile - lower quartile) is 
0. So values for 2 and 3 lie beyond upper quartile 
+ step, and should be plotted individually, as you
imply. 

I have two reactions: 

1. This looks like a bug. Somehow the iqr of 0 is 
getting trapped or ignored, either directly or 
indirectly. For example, if I type in Stata 

. gen x2 = x1 + smidgen 
. gra box x2 

then everything looks OK. (I had 
to be less poetic: instead of -smidgen-, I 
find e.g. 1e-6 * uniform() to work all 
right.) You are also going to want the 
descriptive information for -x1- to show 
up on the graph: there are various ways 
to do this, of which a prior -copydesc x1 x2- 
is one. (-copydesc- is on SSC.) 

2. No box plot is going to be more than 
a line and two points with these data. I guess 
they are just data concocted to show the problem, 
but if your real data are anything like this, 
a discrete histogram or dotplot should do a better
job of showing the data. 

Nick 
n.j.cox@durham.ac.uk 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index