Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: easy histogram

From   "Nick Cox" <>
To   <>
Subject   RE: st: easy histogram
Date   Sat, 1 Mar 2003 18:18:15 -0000

David Airey replied to Jeff Pitblado 
> > David makes a good point for removing the option for 
> -histogram-, and 
> > we will
> > remove the checkboxes for log scales from the easy graph 
> dialog for
> > -histogram-.
> >
> > However, the "full featured" dialog for -histogram- will 
> remain the 
> > same since
> > both -xscale(log)- and -yscale(log)- are valid -graph 
> twoway- options.
> >
> This reply beats around the bush and doesn't explain to me why 
> "histogram mpg, xscale(log)" would ever make sense. That is 
> really my 
> question; I'm ignorant of the answer.
> Jeff's answer that histogram's option xscale is valid 
> because it is a 
> twoway explains why histogram inherits these options. That 
> doesn't mean 
> the design choice is a good one. The area under the 
> histogram should 
> sum to 1 when the yaxis is density, as stated in the 
> manual. I don't 
> think when xscale(log) is used it will (though I have not 
> measured). If 
> it doesn't, then the option is pointless (the option is not 
> pointless 
> for other graph twoway commands like scatter).
> Perhaps not all daughters of the mother twoway should 
> inherit certain 
> twoway options?

This in turn touches on various tricky design issues, one 
being how far statistical software designers (a) should 
and (b) can decide ex cathedra which kinds of graph 
are inadmissible or inappropriate, especially when what 
may seem crazy in one field may turn out to have 
a specific rationale in another. Excellence comes 
easily to Stata Corp, but omniscience is an asymptotic

I don't know a strong case for binning on the original scale, 
yet showing the results with -xscale(log)-. However, blowing 
up the left-hand part of the scale like this 
might have some private use for examining fine structure. 
For example, I have worked with glacier area data which 
tend to be very heavily skewed and problematic at the lower end. 
Among other issues, it can be difficult to distinguish, especially 
without a field visit, between a true glacier and an inert body, 
and different scientists compiling area data (usually in some 
national agency office) tacitly show different degrees 
of scepticism in distinguishing glaciers and non-glaciers. 
For such a problem, graphs of the kind discussed might have some 
private value, as there is often merit in a scale which uses the 
units familiar to researchers. I wouldn't publish such 
a histogram myself, but it might be of some use. 

More generally, the principle that the area under the histogram 
should integrate to 1 -- or to the number of values -- 
is clearly a good one. However, it is not the only criterion. 

Plotting log frequency vs log magnitude is 
common in sedimentology. R.A. Bagnold did this 
in his classic book on wind-blown sand in 1941 
and appropriate hyperbolic distributions have since been 
investigated by O. Barndorff-Nielsen and others. Those 
ideas appear to be drifting into other areas such as 
financial modelling.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index