Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: Histograms (was: Multiple (overlaid) Histogram)


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: Histograms (was: Multiple (overlaid) Histogram)
Date   Thu, 29 May 2003 13:58:40 +0100

R. Allan Reese

> On Thu, 29 May 2003, Nick Cox wrote:
> > ... Empirical. You will see histograms with unequal widths
> > particularly in older books and papers, and the reason was
> > that data for them came already grouped in such classes. There's
> > an example in Snedecor and Cochran's venerable text.
> > That seems far less common today when more and more data sets are
> > available in raw, ungrouped form, modulo confidentiality
> > constraints. I don't see people asking for this often on
> Statalist,
> > and one good reason for this being low down in priority is that
> > it is practice rarely needed.
>
> The linked issue is whether it is strictly true, as Nick previously
> commented, that "adjacent bars touch. (If this isn't true,
> you haven't got
> a proper histogram.)"  In a histogram, it is unit area that
> represents the
> weight of data.  Hence a class interval that is widened should be
> proportionately reduced in height.
>
> I suggest this is a "design decision" which has implications for the
> message conveyed by a graph.  Consider a data series such as
> 4,4,4,5,5,5,5,5,6,6,6,6,16 and use for convenience unit-width bins.
> Stata's histogram command shows the 16 as a single observation and,
> implicitly, as an outlier.  If you don't allow zero-height
> bars but demand
> that adjacent bars touch, the upper bin might run from 7 to
> 16 with a
> height of 0.11 and the data now look like a skewed
> distribution with a
> long upper tail.  Neither version is more "correct"
> absolutely, though one
> may be more appropriate to an interpretation of the data.
>
> Hence, I would support adding the option in the software,
> eg a new option
> "classes(4,5,6,7,16)" or "width(1,1,1,9)", to allow
> irregular spacing.
> The user then has control of the design choice, rather than being
> compelled by the software (writer).
>
> Excel has particularly abhorrent approaches to the choice
> and labelling of
> bins for histograms.

As for the definition, I am more than happy with the idea that some
bars are of zero height and touch other bars which may be of zero
or non-zero height. That is, my definition emphatically does not
rule out gaps between bars: they are just populated by bars of
zero height. (Or, if you like, the principle that adjacent bars
touch does not rule out the possibility of bars not being adjacent.)

As for an extra option, it is easy to specify this as a desired item,
but my guess is that implementing this on top of the
existing -histogram-
command would be far more of a labour than the real benefits
imply. That's Stata Corp's problem, but it could be enough
to push this a long way down the list of priorities.

In addition, I doubt that all users have as much graphical sense
as Allan. This option could be a gateway to lots of rather silly
histograms, and although one shouldn't rule out syntax on the
grounds that it might be abused, I feel queasy at the prospect.

Nick
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index