[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: logarithmic scales |

Date |
Thu, 27 Nov 2003 16:04:03 -0000 |

Here's a small issue which is utterly elementary, but it may provide a moment's bemusement, and I'd welcome any comments. When drawing graphs with one or both axes on a logarithmic scale, Stata by default tries to provide "nice" labels, just as usual. I'm writing a graphics program for a kind of plot in which one axis will _always_ show a logarithmic scale, yet I find that Stata's default for labels often gives me what is both a very sensible and a very poor answer. The fault is possibly as much mine as Stata's, as I may be quirky in what I most often want. Also, despite years of acquaintance, Stata can't know what I want unless I tell it. The issue can be made concrete easily: set obs 100 gen y = _n range x 1 120 scatter y x, xscale(log) Given the range 1 ... 120 Stata gives labels 50 100 150 and most of the x axis is left unlabelled. Why? I'd anthropomorphise the decisions here made by algorithm: * Given the range, 50 is a "nice" interval to use, as 20 would produce "too many" intervals and 100 "far too few". * But only two multiples of 50, 50 and 100, occur within the range, still "too few", so we should stretch the range to produce other labels. 0 is unplottable on a logarithmic scale, so go for 150 (which on a log scale is not much bigger than 120). I thought this was just an extreme case, but both range x 1 1000 and range x 1 10000 produce very similar crowding on the right-hand side of the x axis. Perhaps _you_ don't often get variables which range over 3 or more orders of magnitude, but I do. (To give the game away, I'm plotting so-called return periods for extreme events, so a range from < 1 year (say) to > 1000 or even > 10000 years is natural.) The broader issue is what count as "nice numbers" to show on logarithmic scales, which seems to boil down to * what looks good * what the designer wants to show or expects readers will want to read * any tribal habits, conventions, standards, rules on how "graphs are done in my field", neuropsychopharmacology, Martian econometrics, whatever. Stata carries over from arithmetic scales the idea that nice labels will show numbers equally spaced on an arithmetic scale; so the logarithmic scale is emphasised by the uneven positioning of the labels. Another take, which happens to be mine more frequently, is that nice labels will show numbers equally spaced on a logarithmic scale, so the logarithmic scale is emphasised by the fact that each unit step implies multiplying by a constant. (I also find that key in explaining to students, many of whom have never met or really understood logarithms.) It's not difficult to set up a default like this for a program or .do file. Get the min and max, and conservatively round inwards from the extremes: su x, meanonly local min = ceil(log10(r(min))) local max = floor(log10(r(max))) Show powers of 10 of the numlist min/max: forval i = `min'/`max' { local labels "`labels' `=10^(`i')'" } If that would mean only 1 or 2 labels shown, add some more: if (`max' - `min') < 2 { forval i = `=`min'-1'/`max'{ local labels "`labels' `=3 * 10^(`i')'" } } The code here may be a little difficult to read if you don't know Stata syntax for macro evaluation on the fly, but this would add labels 30 and 300 if otherwise the only label shown would be 100, for example. The implementation could be refined; more important is the underlying idea, that numbers like 1, 3, 10, 30, 100 look quite (British sense, not American) nice on a log scale: evidently 3 is just less than sqrt(10), or otherwise put, log10(3) is just less than 0.5, for example, so that 3 is almost halfway on a log scale between 1 and 10. Even the texture, alternating jumps of * 3 and * 10/3, could be seen as a small feature, reviving distant memories of pre-prepared logarithmic graph paper. Perhaps a little more common in literature is using numbers like 1, 2, 5, 10, 20, 50, 100 with the motivation that multiples of 2 and 5 are conventionally regarded as "nicer" than multiples of 3. The alternation of jumps of * 2 and * 5/2 is less of a drawback, in my view, than the common consequence that graphs with such labels often appear too "busy" by modern tastes. Any views on this, including reports on tribal attitudes or customs? Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: Formaatin the labels of the histogram-bars ?** - Next by Date:
**Re: st: RE: Formaatin the labels of the histogram-bars ?** - Previous by thread:
**st: Extracting a scalar from a matrix** - Next by thread:
**RE: st: logarithmic scales** - Index(es):

© Copyright 1996–2020 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |