RE: st: easy histogram

Sat, 1 Mar 2003 18:18:15 -0000

David Airey replied to Jeff Pitblado > > > David makes a good point for removing the option for > -histogram-, and > > we will > > remove the checkboxes for log scales from the easy graph > dialog for > > -histogram-. > > > > However, the "full featured" dialog for -histogram- will > remain the > > same since > > both -xscale(log)- and -yscale(log)- are valid -graph > twoway- options. > > > > This reply beats around the bush and doesn't explain to me why > "histogram mpg, xscale(log)" would ever make sense. That is > really my > question; I'm ignorant of the answer. > > Jeff's answer that histogram's option xscale is valid > because it is a > twoway explains why histogram inherits these options. That > doesn't mean > the design choice is a good one. The area under the > histogram should > sum to 1 when the yaxis is density, as stated in the > manual. I don't > think when xscale(log) is used it will (though I have not > measured). If > it doesn't, then the option is pointless (the option is not > pointless > for other graph twoway commands like scatter). > > Perhaps not all daughters of the mother twoway should > inherit certain > twoway options? This in turn touches on various tricky design issues, one being how far statistical software designers (a) should and (b) can decide ex cathedra which kinds of graph are inadmissible or inappropriate, especially when what may seem crazy in one field may turn out to have a specific rationale in another. Excellence comes easily to Stata Corp, but omniscience is an asymptotic property. I don't know a strong case for binning on the original scale, yet showing the results with -xscale(log)-. However, blowing up the left-hand part of the scale like this might have some private use for examining fine structure. For example, I have worked with glacier area data which tend to be very heavily skewed and problematic at the lower end. Among other issues, it can be difficult to distinguish, especially without a field visit, between a true glacier and an inert body, and different scientists compiling area data (usually in some national agency office) tacitly show different degrees of scepticism in distinguishing glaciers and non-glaciers. For such a problem, graphs of the kind discussed might have some private value, as there is often merit in a scale which uses the units familiar to researchers. I wouldn't publish such a histogram myself, but it might be of some use. More generally, the principle that the area under the histogram should integrate to 1 -- or to the number of values -- is clearly a good one. However, it is not the only criterion. Plotting log frequency vs log magnitude is common in sedimentology. R.A. Bagnold did this in his classic book on wind-blown sand in 1941 and appropriate hyperbolic distributions have since been investigated by O. Barndorff-Nielsen and others. Those ideas appear to be drifting into other areas such as financial modelling. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

