Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: vwidth in old boxplot graphics


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: vwidth in old boxplot graphics
Date   Tue, 23 Aug 2005 19:43:13 +0100

A way of indicating group sizes in the axis labels of a 
box plot. 

. tab rep78

     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          2        2.90        2.90
          2 |          8       11.59       14.49
          3 |         30       43.48       57.97
          4 |         18       26.09       84.06
          5 |         11       15.94      100.00
------------+-----------------------------------
      Total |         69      100.00

. * what follows is all one line 

. graph box mpg, over(rep78, 
relabel(1 `" "1" "(2)" "' 
        2 `" "2" "(8)" "' 
        3 `" "3" "(30)" "' 
        4 `" "4" "(18)" "' 
        5 `" "5" "(11)" "')) 

This could be automated. 


Nick 
n.j.cox@durham.ac.uk 

> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Nick Cox
> Sent: 23 August 2005 18:53
> To: statalist@hsphsun2.harvard.edu
> Subject: st: RE: vwidth in old boxplot graphics
> 
> 
> It is still available in Stata as -graph7, box vwidth-. 
> 
> Programming this yourself in Stata 8 or 9 would be possible, 
> I guess, but not quite trivial. 
> 
> In my view, wanting something like this touches on a key 
> limitation of box plots: they often leave out far too much 
> detail. Box plots, I suggest, are optimal when the number of 
> groups concerned is >~ 10 and severe compression is needed to
> see "the wood for the trees". With fewer groups, more detail 
> is often tolerable and even highly desirable. 
> 
> (Incidentally, an example which I owe to a Howard Wainer paper is 
> instructive. How do you interpret a box plot like this? 
> 
>          +---------------+------------------+
>     +----|               |                  |----+ 
> 	   +---------------+------------------+	
> 
> Most people I have asked go for a diagnosis of a short-tailed 
> distribution. This forgets that if the half the distribution
> is inside the box, then the other half must be outside. In 
> this case, the average density in the tails must be much higher than 
> in the centre and the best guess has to be a U-shaped distribution. 
> 
> Boxplots can be much harder to interpret than you think!) 
> 
> Alternatives such as -dotplot- (official), 
> -onewayplot- (user-written), -beamplot- (user-written) 
> dedicated to the idea of showing one symbol for every 
> data point have the advantage that a clear impression of 
> the number of data points is given. 
> 
> Yet further plots are possible showing all the quantiles. 
> -quantile- (official) is here less flexible than -qplot- 
> (user-written). The next issue of the Stata Journal will
> carry a long diatribe on quantile plots and will 
> be accompanied by an enhanced version of -qplot- (and 
> also of -distplot-, also user-written). 
> 
> A yet further possibility is to hybridise dot and box 
> plots. One example is given by Wild and Seber, "Chance 
> encounters" p.122. 
> 
> Fortuitously, just this afternoon a colleague and I 
> came up with our own hybrid. This assumes a categorical 
> variable coded by successive integers. I wouldn't defend 
> this default design to the limit as it was entirely 
> optimised for one particular dataset. However, the main 
> point is that your own hybrid design is attainable 
> with some coding. (Varying width boxes do sound a bit 
> harder.) 
> 
> For this to work, you need -onewayplot- from SSC. 
> 
> Silly example: 
> 
> . sysuse auto
> . myboxplot mpg rep78, magic(-0.2) rbar(barw(0.15)) 
> ysc(noreverse) stack h(0.5)
> 
> *! 1.0.0 NJC/ISE 23 Aug 2005 
> program myboxplot 
> 	version 9 
> 	syntax varlist(min=2 max=2 numeric) [if] [in] ///
> 	[, magic(real 0.4) rbar(str asis) * ] 
> 	marksample touse
> 	qui count if `touse' 
> 	if r(N) == 0 error 2000
> 
> 	tokenize `varlist' 
> 	args y cat 
> 
> 	tempvar median upq loq offset 
> 	qui { 
> 		egen `median' = median(`y') if `touse', by(`cat')
> 		egen `loq' = pctile(`y') if `touse', by(`cat') p(25)
> 		egen `upq' = pctile(`y') if `touse', by(`cat') p(75)
> 		gen `offset' = `cat' + `magic'  
> 	}
> 	onewayplot `y' if `touse', by(`cat') msy(+) msize(small) ///
> 	plot(rbar `upq' `median' `offset',            ///
> 	barw(0.25) blcolor(black) bcolor(gs14) hor legend(off) 
> `rbar' ///
> 	|| rbar `loq' `median' `offset',              ///
> 	barw(0.25) bcolor(gs14) blcolor(black) hor `rbar') ///
> 	xti("`: variable label `y''") ysc(reverse) yla(, 
> noticks) yti("") `options'  
> end 
> 
> 
> 
> 
> 
> Nick 
> n.j.cox@durham.ac.uk 
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index