"Nick Cox" <n.j.cox@durham.ac.uk>

<statalist@hsphsun2.harvard.edu>

st: RE: RE: Y axis values for hist ,density

Thu, 27 Oct 2005 16:14:45 +0100

Allan's complaining about perceived perversity, but I am not clear what he would regard as good behaviour. I can see a good case for arguing that with -histogram, discrete-, and nothing else said, the default should have been -frequency-, but yoking options like that is rarely good software design. Anyway, that wasn't done, and a change is now more difficult to justify. As -frequency- is just an option away, this strikes me overall as a very little deal. I rarely get graphs right first time in any case, and others may have had similar experiences. Nick n.j.cox@durham.ac.uk Jann Ben > Bang! I don't agree. The purpose of a histogram is to make > visible the shape of a density. It is therefore natural to > report the y-axis in terms of a density. Allan Reese (Cefas) > > The default "hist x" command in Stata gives a Y axis labelled > > a density. I've never given it much attention until I saw > > the scale went up to 2 on a plot. Hold on, density functions > > sum to 1 over the variable. > > > > Further investigation and discussion with Statacorp > > identified that the default tries to make the "area" of the > > bars add up to 1. If the number of bars changes, so does > > their width and so does the Y labelling. In my example, the > > data were discrete, so increasing the number of intervals did > > not change the plot except to add more zero-height columns > > and hence make each column narrower. > > > > hist x, bin(n) therefore caused different Y > > labelling with varying n > > hist x, xcale(xrange(0 n) did not affect the labelling, > > though the bars got narrower with bigger n > > hist x, frac and > > hist x, discrete both gave correct labelling, and > > the sum of column heights was 1. > > > > Do other users think this is perverse behaviour, especially > > as the default? My take is that, when drawing a histogram, > > the column width is taken as an arbitrary unit, not directly > > related to the x-scale. The implication is that you need to > > scale the height only when there are mixed-width columns, but > > would not label the Y axis in "freq/absolute-width" units. > > Having "densities" that vary and are in such peculiar units > > (1/locust in my example!) does not seem helpful. > > > > Shoot me down * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

