Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Histograms (was: Multiple (overlaid) Histogram)


From   "R. Allan Reese" <[email protected]>
To   Stata distribution list <[email protected]>
Subject   Re: st: RE: Histograms (was: Multiple (overlaid) Histogram)
Date   Thu, 29 May 2003 13:07:13 +0100 (BST)

On Thu, 29 May 2003, Nick Cox wrote:
> ... Empirical. You will see histograms with unequal widths
> particularly in older books and papers, and the reason was
> that data for them came already grouped in such classes. There's
> an example in Snedecor and Cochran's venerable text.
> That seems far less common today when more and more data sets are
> available in raw, ungrouped form, modulo confidentiality
> constraints. I don't see people asking for this often on Statalist,
> and one good reason for this being low down in priority is that
> it is practice rarely needed.

The linked issue is whether it is strictly true, as Nick previously
commented, that "adjacent bars touch. (If this isn't true, you haven't got
a proper histogram.)"  In a histogram, it is unit area that represents the
weight of data.  Hence a class interval that is widened should be
proportionately reduced in height.

I suggest this is a "design decision" which has implications for the
message conveyed by a graph.  Consider a data series such as
4,4,4,5,5,5,5,5,6,6,6,6,16 and use for convenience unit-width bins.
Stata's histogram command shows the 16 as a single observation and,
implicitly, as an outlier.  If you don't allow zero-height bars but demand
that adjacent bars touch, the upper bin might run from 7 to 16 with a
height of 0.11 and the data now look like a skewed distribution with a
long upper tail.  Neither version is more "correct" absolutely, though one
may be more appropriate to an interpretation of the data.

Hence, I would support adding the option in the software, eg a new option
"classes(4,5,6,7,16)" or "width(1,1,1,9)", to allow irregular spacing.
The user then has control of the design choice, rather than being
compelled by the software (writer).

Excel has particularly abhorrent approaches to the choice and labelling of
bins for histograms.

R. Allan Reese                       Email: [email protected]


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index