# Re: Re: st: programming graphs in stata

 From n j cox <[email protected]> To [email protected] Subject Re: Re: st: programming graphs in stata Date Wed, 17 Oct 2007 19:52:26 +0100

indicated a specific interest in spineplots. I responded
with some general pontifications ex cathedra and Scott
Merryman and then Maarten Buis (below) responded with
real code. Late last night I recalled that bars of
unequal width are needed for the equal-probability histograms
beloved of Marcello Pagano and implemented in -eqprhistogram-
(available from SSC from a few years back).

The trick then was to use the (still undocumented) -bartype(spanning)- option. (I owe this tip to Vince Wiggins.) And the real code offered showed that good results were possible in a few dozen lines. Hence
the challenge was to go a step further and produce a program.
Whether -bartype(spanning)- really is the best way to do it overall
I am not sure. I had to re-discover from -eqprhistogram- how it
works. Anyway, this is offered to those on Stata 9 or 10.

*! NJC 1.0.0 17 Oct 2007
program spineplot
version 9
forval i = 1/20 {
local baropts `baropts' bar`i'(str asis)
}
syntax varlist(min=2 max=2 numeric) [fweight] [if] [in] ///
[, MISSing `baropts' barall(str asis) * ]

quietly {
if "`missing'" != "" marksample touse, novarlist
else marksample touse
count if `touse'
if r(N) == 0 error 2000

tokenize "`varlist'"
args y x

preserve
tempvar f xf xF xmid tag yF xshow
contract `x' `y' if `touse' [`exp'], zero f(`f')

bysort `x' (`y'): gen `xf' = sum(`f')
by `x' : replace `xf' = `xf'[_N]
by `x' : gen byte `tag' = _n == 1
gen `xF' = sum(`xf' * `tag')
by `x': gen `xmid' = `xF'[_N] - 0.5 * `xf'[_N]
local total = `xF'[_N]

by `x' : gen `yF' = sum(`f')
by `x' : replace `yF' = 100 * `yF'/ `yF'[_N]

local first = _N + 1
expand 2 if `x' == `x'[_N]
replace `yF' = . in `first'/l
sort `y' `x' `yF'
by `y' : gen `xshow' = cond(_n == 1, 0, `xF'[_n - 1])

forval i = 0/4 {
local xla1 = `i' * `total'/4
local xla2 = `i' * 25
local xla `" `xla' `xla1' "`xla2'" "'
}

local xtitle : var label `x'
if `"`xtitle'"' == "" local xtitle `x'
label var `xshow' `"percent by `xtitle'"'
local ytitle : var label `y'
if `"`ytitle'"' == "" local ytitle `y'
label var `yF' `"percent by `ytitle'"'

levelsof `x', local(levels)
foreach l of local levels {
su `xmid' if `x' == `l', meanonly
local l : label (`x') `l'
local XLA `" `XLA' `r(min)' "`l'" "'
}

drop `f' `xf' `xF' `xmid' `tag'

levelsof `y', local(levels)
local ny : word count `levels'
tokenize `levels'
forval i = `ny'(-1)1 {
local pipe = cond(`i' > 1, "||", "")
local I = `ny' - `i' + 1

local call `call' ///
bar `yF' `xshow' if `y' == ``i'' ///
, bartype(spanning) blcolor(gs14) blw(medium) ///
`barall' `bar`I'' `pipe'
local which : label (`y') ``i''
local lgnd `lgnd' `I' `"`which'"'
}
}

twoway `call' ///
xaxis(1 2) ///
xsc(r(0, `total') axis(1)) ///
xsc(r(0, `total') axis(2)) ///
xla(`XLA', axis(2) noticks) ///
xla(`xla', axis(1)) ///
xtitle(`"`xtitle'"', axis(2)) ///
yla(0(25)100, ang(h) nogrid) ///
legend(order(`lgnd') col(1) pos(3)) ///
`options'
end

Anyone who wants to play before I produce a proper help file
should note that the syntax is

spineplot yvar xvar [fweight] [if] [in]
[, bar1(<twoway_bar_options>) ... bar20(<twoway_bar_options>)
barall(<twoway_bar_options>) MISSing <twoway_options> ]

The two variables yvar and xvar must be numeric and will
usually be categorical variables with value labels and
just a few categories. Experiment with the canonical examples

sysuse auto, clear
spineplot foreign rep78
spineplot rep78 foreign

If -foreign- is yvar and -rep78- is xvar, then each vertical bar is
subdivided in 2 according to the frequencies of the 5 categories of -foreign-. Conversely, if -foreign- is xvar, each vertical bar is
subdivided in 5 according to the categories of -rep78-. Vertical
extents show percent(yvar categories | xvar category) and horizontal
extents show frequency or percent of xvar categories. Even with
a relatively simple twoway table, I find that the 2 y categories X
5 x categories layout works much better than the 5 X 2 layout. Anyway, as many people work with binary responses, the former will often appear natural and congenial.

The program works by -contract-ing the data on the fly and
calculating cumulative frequencies. (Maarten uses -collapse-
in a slightly devious way. -contract- is more direct and
typically a bit faster.) The graph is produced
by overlaying distinct -twoway bar, bartype(spanning)-
for each category of yvar. Bars "at the back" are laid down
first. By default, each bar has
-blcolor(gs14) blw(medium)- which I find sufficiently
distinctive to outline each bar. (Maarten's code tackles
the same issue by trimming the bars so that there are
gaps between.) By default the categories of yvar will
be distinguished according to the -scheme- you have in
operation. With the default -s2color- the effect is mildly
unattractive fruit salad, so for a serious graph you might want
to step in with something more delicate, such as various
grey scales.

Options -bar1()- to -bar20()- are provided to allow overriding
the defaults on categories 1, ..., 20. The number 20 is
plucked out of the air as more than I guess anyone really wants, but those so minded can hack at the code of a cloned version to
overrule this limit.

The option -barall()- is available to override the
defaults for all bars. Thus if you wanted thicker
-blw()- on all you would add -barall(blw(thick))-.

Other defaults include -legend(col(1) pos(3))-. At
least with -s2color- this implies an approximately
square plot region which can look quite good, although
no doubt there are datasets for which it is a bad idea.

I played a little with the idea of adding a -by()- option,
giving the possibility of an array of such plots. That's
a little hard to do nicely, so you are probably better off
producing separate graphs and then combining them, with
some polishing in the graph editor (Stata 10 only).

Nick
[email protected]

Maarten Buis

--- Scott Merryman <[email protected]> wrote:
> Do you mean something like this?
<snip>

Matthias was looking for a spineplot, and Scott's code got close to
that, but is not a spineplot. The code below should give you a
spineplot (and a barplot and a combination of the two for comparison).

*---------------- begin example --------------------
sysuse auto,clear
collapse (count) mpg, by(rep foreign)
drop if rep78 >=.
reshape wide mpg, i(rep78) j(foreign)
graph bar (asis) mpg1 mpg0, ///
over(rep78) stack ///
legend(order(1 "foreign" 2 "domestic")) ///
title("bar plot") ///
bar(1, color(gs8)) bar(2,color(gs12)) ///
name(bar,replace)

sysuse auto, clear
collapse (count) mpg (mean) foreign, by(rep)
keep if rep !=.
sum mpg
gen freq = .95*mpg/r(sum)
sort rep78
gen gr_order = _n - 1
expand 2
bysort rep: gen t = _n

gen gap = gr_order * .05/4
gen sum = sum(fre) if t ==2
replace sum = sum[_n-1] if t == 1
replace sum = 0 in 1
replace sum = sum + gap

forvalues i = 0/4 {
local ibegin = `i' * 2 + 1
local iend = `i' * 2 + 2
local begin = sum[`ibegin']
local end = sum[`iend']
local mid = `begin' + (`end' - `begin')/2
local lab = rep78[`ibegin']
local xlab `"`xlab' `mid' "`lab'""'
}

gen foo = 1

twoway area foo sum if rep <=1, color(gs12) ///
|| area foreign sum if rep <=1, color(gs8) ///
|| area foo sum if rep <=2 & rep>1 , color(gs12) ///
|| area foreign sum if rep <=2 & rep>1, color(gs8) ///
|| area foo sum if rep <=3 & rep>2 , color(gs12) ///
|| area foreign sum if rep <=3 & rep>2 , color(gs8) ///
|| area foo sum if rep <=4 & rep>3 , color(gs12) ///
|| area foreign sum if rep <=4 & rep>3, color(gs8) ///
|| area foo sum if rep <=5 & rep>4 , color(gs12) ///
|| area foreign sum if rep <=5 & rep>4, color(gs8) ///
||, legend(order(1 "domestic" 2 "foreign")) ///
xlabel(`xlab') xtitle("") ///
ylab(0(.25)1) ytitle("") ///
title("spineplot") ///
name(spine, replace)

grc1leg bar spine
*--------------------- end example -----------------------
(For more on how to use examples I sent to the Statalist, see
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/