[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
n j cox <n.j.cox@durham.ac.uk> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: Re: st: programming graphs in stata |

Date |
Wed, 17 Oct 2007 19:52:26 +0100 |

Matthias Schonlau asked about programming graphs and

indicated a specific interest in spineplots. I responded

with some general pontifications ex cathedra and Scott

Merryman and then Maarten Buis (below) responded with

real code. Late last night I recalled that bars of

unequal width are needed for the equal-probability histograms

beloved of Marcello Pagano and implemented in -eqprhistogram-

(available from SSC from a few years back).

The trick then was to use the (still undocumented) -bartype(spanning)- option. (I owe this tip to Vince Wiggins.) And the real code offered showed that good results were possible in a few dozen lines. Hence

the challenge was to go a step further and produce a program.

Whether -bartype(spanning)- really is the best way to do it overall

I am not sure. I had to re-discover from -eqprhistogram- how it

works. Anyway, this is offered to those on Stata 9 or 10.

------------------------------- spineplot.ado

*! NJC 1.0.0 17 Oct 2007

program spineplot

version 9

forval i = 1/20 {

local baropts `baropts' bar`i'(str asis)

}

syntax varlist(min=2 max=2 numeric) [fweight] [if] [in] ///

[, MISSing `baropts' barall(str asis) * ]

quietly {

if "`missing'" != "" marksample touse, novarlist

else marksample touse

count if `touse'

if r(N) == 0 error 2000

tokenize "`varlist'"

args y x

preserve

tempvar f xf xF xmid tag yF xshow

contract `x' `y' if `touse' [`exp'], zero f(`f')

bysort `x' (`y'): gen `xf' = sum(`f')

by `x' : replace `xf' = `xf'[_N]

by `x' : gen byte `tag' = _n == 1

gen `xF' = sum(`xf' * `tag')

by `x': gen `xmid' = `xF'[_N] - 0.5 * `xf'[_N]

local total = `xF'[_N]

by `x' : gen `yF' = sum(`f')

by `x' : replace `yF' = 100 * `yF'/ `yF'[_N]

local first = _N + 1

expand 2 if `x' == `x'[_N]

replace `yF' = . in `first'/l

sort `y' `x' `yF'

by `y' : gen `xshow' = cond(_n == 1, 0, `xF'[_n - 1])

forval i = 0/4 {

local xla1 = `i' * `total'/4

local xla2 = `i' * 25

local xla `" `xla' `xla1' "`xla2'" "'

}

local xtitle : var label `x'

if `"`xtitle'"' == "" local xtitle `x'

label var `xshow' `"percent by `xtitle'"'

local ytitle : var label `y'

if `"`ytitle'"' == "" local ytitle `y'

label var `yF' `"percent by `ytitle'"'

levelsof `x', local(levels)

foreach l of local levels {

su `xmid' if `x' == `l', meanonly

local l : label (`x') `l'

local XLA `" `XLA' `r(min)' "`l'" "'

}

drop `f' `xf' `xF' `xmid' `tag'

levelsof `y', local(levels)

local ny : word count `levels'

tokenize `levels'

forval i = `ny'(-1)1 {

local pipe = cond(`i' > 1, "||", "")

local I = `ny' - `i' + 1

local call `call' ///

bar `yF' `xshow' if `y' == ``i'' ///

, bartype(spanning) blcolor(gs14) blw(medium) ///

`barall' `bar`I'' `pipe'

local which : label (`y') ``i''

local lgnd `lgnd' `I' `"`which'"'

}

}

twoway `call' ///

xaxis(1 2) ///

xsc(r(0, `total') axis(1)) ///

xsc(r(0, `total') axis(2)) ///

xla(`XLA', axis(2) noticks) ///

xla(`xla', axis(1)) ///

xtitle(`"`xtitle'"', axis(2)) ///

yla(0(25)100, ang(h) nogrid) ///

legend(order(`lgnd') col(1) pos(3)) ///

`options'

end

-------------------------- spineplot.ado

Anyone who wants to play before I produce a proper help file

should note that the syntax is

spineplot yvar xvar [fweight] [if] [in]

[, bar1(<twoway_bar_options>) ... bar20(<twoway_bar_options>)

barall(<twoway_bar_options>) MISSing <twoway_options> ]

The two variables yvar and xvar must be numeric and will

usually be categorical variables with value labels and

just a few categories. Experiment with the canonical examples

sysuse auto, clear

spineplot foreign rep78

spineplot rep78 foreign

If -foreign- is yvar and -rep78- is xvar, then each vertical bar is

subdivided in 2 according to the frequencies of the 5 categories of -foreign-. Conversely, if -foreign- is xvar, each vertical bar is

subdivided in 5 according to the categories of -rep78-. Vertical

extents show percent(yvar categories | xvar category) and horizontal

extents show frequency or percent of xvar categories. Even with

a relatively simple twoway table, I find that the 2 y categories X

5 x categories layout works much better than the 5 X 2 layout. Anyway, as many people work with binary responses, the former will often appear natural and congenial.

The program works by -contract-ing the data on the fly and

calculating cumulative frequencies. (Maarten uses -collapse-

in a slightly devious way. -contract- is more direct and

typically a bit faster.) The graph is produced

by overlaying distinct -twoway bar, bartype(spanning)-

for each category of yvar. Bars "at the back" are laid down

first. By default, each bar has

-blcolor(gs14) blw(medium)- which I find sufficiently

distinctive to outline each bar. (Maarten's code tackles

the same issue by trimming the bars so that there are

gaps between.) By default the categories of yvar will

be distinguished according to the -scheme- you have in

operation. With the default -s2color- the effect is mildly

unattractive fruit salad, so for a serious graph you might want

to step in with something more delicate, such as various

grey scales.

Options -bar1()- to -bar20()- are provided to allow overriding

the defaults on categories 1, ..., 20. The number 20 is

plucked out of the air as more than I guess anyone really wants, but those so minded can hack at the code of a cloned version to

overrule this limit.

The option -barall()- is available to override the

defaults for all bars. Thus if you wanted thicker

-blw()- on all you would add -barall(blw(thick))-.

Other defaults include -legend(col(1) pos(3))-. At

least with -s2color- this implies an approximately

square plot region which can look quite good, although

no doubt there are datasets for which it is a bad idea.

I played a little with the idea of adding a -by()- option,

giving the possibility of an array of such plots. That's

a little hard to do nicely, so you are probably better off

producing separate graphs and then combining them, with

some polishing in the graph editor (Stata 10 only).

Nick

n.j.cox@durham.ac.uk

Maarten Buis

--- Scott Merryman <scott.merryman@gmail.com> wrote:

> Do you mean something like this?

<snip>

Matthias was looking for a spineplot, and Scott's code got close to

that, but is not a spineplot. The code below should give you a

spineplot (and a barplot and a combination of the two for comparison).

*---------------- begin example --------------------

sysuse auto,clear

collapse (count) mpg, by(rep foreign)

drop if rep78 >=.

reshape wide mpg, i(rep78) j(foreign)

graph bar (asis) mpg1 mpg0, ///

over(rep78) stack ///

legend(order(1 "foreign" 2 "domestic")) ///

title("bar plot") ///

bar(1, color(gs8)) bar(2,color(gs12)) ///

name(bar,replace)

sysuse auto, clear

collapse (count) mpg (mean) foreign, by(rep)

keep if rep !=.

sum mpg

gen freq = .95*mpg/r(sum)

sort rep78

gen gr_order = _n - 1

expand 2

bysort rep: gen t = _n

gen gap = gr_order * .05/4

gen sum = sum(fre) if t ==2

replace sum = sum[_n-1] if t == 1

replace sum = 0 in 1

replace sum = sum + gap

forvalues i = 0/4 {

local ibegin = `i' * 2 + 1

local iend = `i' * 2 + 2

local begin = sum[`ibegin']

local end = sum[`iend']

local mid = `begin' + (`end' - `begin')/2

local lab = rep78[`ibegin']

local xlab `"`xlab' `mid' "`lab'""'

}

gen foo = 1

twoway area foo sum if rep <=1, color(gs12) ///

|| area foreign sum if rep <=1, color(gs8) ///

|| area foo sum if rep <=2 & rep>1 , color(gs12) ///

|| area foreign sum if rep <=2 & rep>1, color(gs8) ///

|| area foo sum if rep <=3 & rep>2 , color(gs12) ///

|| area foreign sum if rep <=3 & rep>2 , color(gs8) ///

|| area foo sum if rep <=4 & rep>3 , color(gs12) ///

|| area foreign sum if rep <=4 & rep>3, color(gs8) ///

|| area foo sum if rep <=5 & rep>4 , color(gs12) ///

|| area foreign sum if rep <=5 & rep>4, color(gs8) ///

||, legend(order(1 "domestic" 2 "foreign")) ///

xlabel(`xlab') xtitle("") ///

ylab(0(.25)1) ytitle("") ///

title("spineplot") ///

name(spine, replace)

grc1leg bar spine

*--------------------- end example -----------------------

(For more on how to use examples I sent to the Statalist, see

http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: Re: st: programming graphs in stata***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: Problems installing xStata 10 on AIX** - Next by Date:
**st: orse updated** - Previous by thread:
**Re: st: programming graphs in stata** - Next by thread:
**RE: Re: st: programming graphs in stata** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |