Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: how to overlap histograms


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: how to overlap histograms
Date   Tue, 21 Nov 2006 18:06:44 -0000

A simple example of procedure comes from the same data. 

sysuse auto, clear 

separate mpg, by(foreign) veryshortlabel 

qqplot mpg1 mpg0 

qqplot mpg1 mpg0, ysc(log) xsc(log) 

shows that distributions are related by a multiplicative 
shift (rather than an additive one). That is, the line 
of paired quantiles diverges from the line of equality
on raw scales, but is approximately parallel on log scales. 

In this case, the standard t-test shows an overwhelming 
result, but it is still asking the wrong question!

Alejandro doesn't say what test is behind the statement 
that for his example means differ at a significance level 
of 0.01% (meaning, P < 0.0001), but I guess that he is 
using similar if not identical machinery. 

The difference between an additive shift and a multiplicative
shift is exactly the kind of important structure that is
often evident on a quantile-quantile plot, but somewhat lost
in the bars and baloney of an overlapping histogram. Karl 
Pearson is long dead, and rest in peace; let's bury 
unhelpful histograms too. 

Naturally, there is no guarantee that exactly the 
same trick will work with Alejandro's data, on 
percent of households, but percents tend not to be 
distributed symmetrically, so it wouldn't surprise me. 

Nick 
n.j.cox@durham.ac.uk 

Nick Cox 

> You say "overlap", but any overlapping is a property of 
> the data rather than of graphical procedure. You 
> can superimpose histograms, for example like this: 
> 
> sysuse auto, clear
> 
> twoway histogram mpg if foreign, ///
> start(10) width(2) bcolor(none) blcolor(red) || ///
> histogram mpg if !foreign ,     ///
> start(10) width(2) bcolor(none) blcolor(blue)  ///
> legend(order(1 "foreign" 2 "domestic") col(1) pos(1) ring(0))
> 
> but I find that in general the result is a mess: 
> 
> 1. If distributions overlap, then necessarily one histogram
> will partly occlude another. This can be reduced by 
> for example setting bar colours to invisible, but it cannot
> be eliminated. Perceiving the Gestalt is difficult even
> for foresters accustoming to seeing the trees for the wood
> and the wood for the trees. 
> 
> 2. There is always the minor -- and sometimes the major --
> worry of arbitrariness of bin width and origin. 
> 
> The histogram is 19th century technology: you can do 
> much better with 1960s technology, namely the quantile-quantile
> plot implemented as -qqplot-. 
 
Alejandro Delafuente
  
> > am would like to overlap two histograms, 
> > can anyone tell how 
> > to do so? The code that I have produced so far is the 
> > following, but it 
> > displays two separate histograms with same scale magnitudes:
> > 
> > histogram CONTINUOUS VARIABLE if round!=1, percent 
> > lcolor(red) ytitle(Percent 
> > of households) xtitle(???) xlabel(0(.3)1, ticks) title(, 
> > justification(center)) 
> > note(, justification(center) alignment(top)) legend(off) 
> > by(round, note
> > (Difference in means test significant at 0.01% , size(vsmall) 
> > justification
> > (left)))
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index