Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: Y axis values for hist ,density

From   "Nick Cox" <>
To   <>
Subject   st: RE: RE: Y axis values for hist ,density
Date   Thu, 27 Oct 2005 16:14:45 +0100

Allan's complaining about perceived perversity, but I am not clear what he 
would regard as good behaviour. 

I can see a good case for arguing that with -histogram, discrete-, and nothing 
else said, the default should have been -frequency-, but yoking options like 
that is rarely good software design. Anyway, that wasn't done, and a change 
is now more difficult to justify. 

As -frequency- is just an option away, this strikes me overall as a 
very little deal. I rarely get graphs right first time in any case, and 
others may have had similar experiences. 


Jann Ben
> Bang! I don't agree. The purpose of a histogram is to make 
> visible the shape of a density. It is therefore natural to 
> report the y-axis in terms of a density. 

Allan Reese (Cefas)
> > The default "hist x" command in Stata gives a Y axis labelled 
> > a density.  I've never given it much attention until I saw 
> > the scale went up to 2 on a plot.  Hold on, density functions 
> > sum to 1 over the variable.
> > 
> > Further investigation and discussion with Statacorp 
> > identified that the default tries to make the "area" of the 
> > bars add up to 1.  If the number of bars changes, so does 
> > their width and so does the Y labelling.  In my example, the 
> > data were discrete, so increasing the number of intervals did 
> > not change the plot except to add more zero-height columns 
> > and hence make each column narrower.
> > 
> > hist x, bin(n)            therefore caused different Y 
> > labelling with varying n
> > hist x, xcale(xrange(0 n) did not affect the labelling, 
> > though the bars got narrower with bigger n
> > hist x, frac              and 
> > hist x, discrete          both gave correct labelling, and 
> > the sum of column heights was 1. 
> > 
> > Do other users think this is perverse behaviour, especially 
> > as the default?  My take is that, when drawing a histogram, 
> > the column width is taken as an arbitrary unit, not directly 
> > related to the x-scale.  The implication is that you need to 
> > scale the height only when there are mixed-width columns, but 
> > would not label the Y axis in "freq/absolute-width" units.  
> > Having "densities" that vary and are in such peculiar units 
> > (1/locust in my example!) does not seem helpful.
> > 
> > Shoot me down

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index