[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: -pdplot- available from SSC for Pareto dot plots

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: -pdplot- available from SSC for Pareto dot plots
Date	Fri, 17 Nov 2006 11:16:14 -0000

Thanks to Kit Baum, a new module -pdplot- for Pareto dot plots is now
available from SSC. Stata 9 is required. Install with -ssc- if 
interested. 

-pdplot- produces a Pareto dot plot as proposed by  

Wilkinson, Leland. 2006. Revising the Pareto chart. 
American Statistician 60(4): 332-334. 

The frequencies of the categories of a categorical variable are shown in
order by a series of dots against a magnitude scale. As backdrop,
corresponding acceptance intervals are shown by bars. 

The command is more flexible than this description of default behaviour
implies. The intervals can be suppressed and the dot plot can be 
-recast()- to another kind of -twoway- plot. 

Wilkinson (2006) briefly reviews Pareto charts which commonly combine
two displays in one. Frequencies in various categories are shown by a
series of bars arranged in frequency order, from most common downwards.
On that is often superimposed a rising curve showing cumulative
frequency.  Frequency and cumulative frequency may or may not have
consistent scales.  Examples from quality management studies often show
categories of accidents, complaints, defects, failures, rejects,
returns, or other such unwelcome phenomena. Wilkinson gives several
cogent criticisms of this design and suggests an alternative: show
frequencies in order, but by a dot plot, but add as reference a series
of acceptance intervals. (Indeed Wilkinson's paper is important 
reading for anyone tempted to use Pareto charts.) 

The acceptance intervals are calculated by simulation. Imagine as
benchmark a population in which k categories are equally probable, and
imagine taking samples of size n. Here k and n are the same as those in
the data under consideration. Just by chance the observed frequencies of
the k categories will typically differ. For each sample we can label the
frequencies f_(1) >= f_(2) > ... >= f_(k-1) >= f_(k): thus f_(1) is the
frequency of the most abundant category, and so forth. Across several
samples we can get order statistics for each f_(j) and use those to
calculate intervals with desired coverage. 

After seeing this graph in the latest issue of _The American
Statistician_, it seemed to me a nice project to implement 
it in Stata. Graphically, it is a case of superimposing a -twoway 
scatter- on a -twoway rbar-, with the opportunities that such 
choice allows for -recast()-ing to other -twoway- forms. The 
underlying simulations are best done using Mata. 

William Gould and Vince Wiggins made very helpful suggestions. 

This program may interest those whose work brings them into 
territory in which Pareto charts are used. For example, they 
appear fairly common in some parts of the health sciences. However, 
it is not intended as a general display for categorical frequencies. 
-catplot- and -tabplot- from SSC have more pretensions
to that role. Nor does it apply if your data are proportions 
or percents or measurements, rather than instances or counts of
categories. 

Nick 
[email protected] 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: RE: histogram plot with lines
Next by Date: st: RE: RE: histogram plot with lines
Previous by thread: st: RE: histogram plot with lines
Next by thread: st: RE: RE: histogram plot with lines
Index(es):
- Date
- Thread