Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: -bandplot- available from SSC


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: -bandplot- available from SSC
Date   Mon, 24 Nov 2008 14:51:24 -0000

Thanks to Kit Baum, a new package -bandplot- is available from SSC. 

The name "band plot" is my own. I wrote this program to do what it does
and needed a name. I toyed briefly with names like -suygrxplot- but
decided that I would probably forget such names myself. There is a small
risk of this program being misunderstood as drawing one or more
(coloured) bands to represented stacked series, but that is not what I
am about. 

-bandplot- requires Stata 8. Some more details follow my signature, but
the help file includes them all (and more). You can install -bandplot-
using 

. ssc inst bandplot 

as usual. 

The package includes a rudimentary demo file, bandplottest.do. Open a
new directory or folder and run it to see some example graphs. 

Nick 
[email protected] 

-bandplot- produces plots showing summary statistics of one or more
response variables for bands of one or more predictor variables.

By default, -bandplot- is a wrapper for graph dot.  Optionally, bandplot
can be specified to be a wrapper for -graph hbar- or -graph bar-.

There are two syntaxes. In the first, -bandplot- takes the first
variable in a varlist to be a response variable yvar, which is
summarised for observations in each of various bands of the other
predictor variables xvars. In the second, -bandplot- takes two or more
variables specified first within parentheses () as being response
variables yvars; all
subsequent variables are then taken to be predictors xvars.

By default, -bandplot- shows means. Any other statistics produced by
-summarize- may be specified. Note that with two or more yvars only one
statistic may be shown.

"Bands" are to be interpreted as follows. By default numeric variables
are divided into quantile-based bands. (By default in turn
quartile-based bands are used.) Alternatively, variables can be declared
explicitly or implicitly as categorical, in which case the distinct
values of each such variable are used as bands. Any string variables
specified as xvars are treated as categorical, regardless of any other
specifications.  No string variables may be specified as yvars.

The idea of showing summaries of responses for bands of one or more
predictors evidently has a long history, which is difficult to trace.
Plots summarizing polls or elections in terms of votes for major parties
or candidates broken down separately by categorical variables such as
sex, age, race or region are common. The particular choices here were
inspired largely by examples given by Harrell (2001). See his pp. 126,
303f, 314f, 336.

What -bandplot- offers is perhaps best explained by a direct comparison
with -graph dot-. There are three major differences and several minor
differences. (Similar comments apply to -graph bar- or -graph hbar- if
either is invoked.)

First, consider an example with the auto data. Compare

. graph dot (mean) mpg, over(foreign) over(rep78)

and

. bandplot mpg foreign rep78, cat(foreign rep78)

The -graph dot- command shows means of mpg for the cross-combinations of
foreign and  rep78 occurring in the data, i.e. one variable's classes
are nested inside the other's. The -bandplot- command shows means of mpg
separately for classes of each variable.

Second, -bandplot- supports quantile-based bands on the fly. You could
show those with -graph dot-, but you would need to create any variables
classed into bands first, say by using -xtile-.

Third, -graph dot- typically carries out a temporary reduction of the
dataset, but -bandplot- carries out its own reduction and passes the
results to -graph dot- for plotting -asis-. Various options of -graph
dot- are thus irrelevant or inappropriate so far as -bandplot- is
concerned. Further, variables in the dataset are not accessible to the
-graph dot- command.

-bandplot- does not offer any rounding or coarsening option such as
might be used to bin numeric variables into equal intervals. You would
need to do that first. Advice is to use -clonevar- to create a copy of a
variable (notably, keeping the variable label) and then to replace that
with a binned version using a function such as -round()-, -floor()- or
-ceil()-. Then declare such variables to -bandplot- as categorical
[sic].

Although -bandplot- ignores missing values on the yvars, the structure
of such missing values may be explored by creating an indicator for
missingness using -missing()-.


Harrell, F.E. 2001.  Regression Modeling Strategies: With Applications
to Linear Models, Logistic Regression, and Survival Analysis.  New York:
Springer.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index