Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: logarithmic scales

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: logarithmic scales
Date   Thu, 27 Nov 2003 16:04:03 -0000

Here's a small issue which is utterly elementary, but it may
provide a moment's bemusement, and I'd welcome any comments.

When drawing graphs with one or both axes on a logarithmic scale,
Stata by default tries to provide "nice" labels, just as usual. 

I'm writing a graphics program for a kind of plot in which one
axis will _always_ show a logarithmic scale, yet I find that
Stata's default for labels often gives me what is both a very 
sensible and a very poor answer. The fault is possibly as much 
mine as Stata's, as I may be quirky in what I most often want. 
Also, despite years of acquaintance, Stata can't know what I 
want unless I tell it. 

The issue can be made concrete easily: 

set obs 100 
gen y = _n 
range x 1 120 
scatter y x, xscale(log) 

Given the range 1 ... 120 Stata gives labels 50 100 150 and most
of the x axis is left unlabelled. Why? 

I'd anthropomorphise the decisions here made by algorithm: 

* Given the range, 50 is a "nice" interval to use, as 20 would
produce "too many" intervals and 100 "far too few". 
* But only two multiples of 50, 50 and 100, occur within the
range, still "too few", so we should stretch the range to produce
other labels. 0 is unplottable on a logarithmic scale, so go for
150 (which on a log scale is not much bigger than 120). 

I thought this was just an extreme case, but both 

range x 1 1000


range x 1 10000 

produce very similar crowding on the right-hand side of the x
axis. Perhaps _you_ don't often get variables which range over 3
or more orders of magnitude, but I do. (To give the game away, I'm
plotting so-called return periods for extreme events, so a range
from < 1 year (say) to > 1000 or even > 10000 years is natural.) 

The broader issue is what count as "nice numbers" to show on
logarithmic scales, which seems to boil down to 

* what looks good

* what the designer wants to show or expects readers will want to 

* any tribal habits, conventions, standards, rules on how "graphs
are done in my field", neuropsychopharmacology, Martian
econometrics, whatever. 

Stata carries over from arithmetic scales the idea that nice
labels will show  numbers equally spaced on an arithmetic scale;
so the logarithmic scale is emphasised by the uneven positioning
of the labels. 

Another take, which happens to be mine more frequently, is that
nice labels will show numbers equally spaced on a logarithmic
scale, so the logarithmic scale is emphasised by the fact that
each unit step implies multiplying by a constant. (I also find
that key in explaining to students, many of whom have never met or
really understood logarithms.) 

It's not difficult to set up a default like this for a program or
.do file. 

Get the min and max, and conservatively round inwards from the

su x, meanonly 
local min = ceil(log10(r(min))) 
local max = floor(log10(r(max)))

Show powers of 10 of the numlist min/max: 
forval i = `min'/`max' { 
	local labels "`labels' `=10^(`i')'" 

If that would mean only 1 or 2 labels shown, add some more: 

if (`max' - `min') < 2 {
	forval i = `=`min'-1'/`max'{ 
		local labels "`labels' `=3 * 10^(`i')'"

The code here may be a little difficult to read if you don't know
Stata syntax for macro evaluation on the fly, but this would add
labels 30 and 300 if otherwise the only label shown would be 100,
for example. 

The implementation could be refined; more important is the
underlying idea, that numbers like 

1, 3, 10, 30, 100 

look quite (British sense, not American) nice on a log scale:
evidently 3 is just less than sqrt(10), or otherwise put, log10(3)
is just less than 0.5, for example, so that 3 is almost halfway on
a log scale between 1 and 10. Even the texture, alternating jumps
of * 3 and * 10/3, could be seen as a small feature, reviving
distant memories of pre-prepared logarithmic graph paper. 

Perhaps a little more common in literature is using numbers like 

1, 2, 5, 10, 20, 50, 100 

with the motivation that multiples of 2 and 5 are conventionally
regarded as "nicer" than multiples of 3. The alternation of jumps
of * 2 and * 5/2 is less of a drawback, in my view, than the
common consequence that graphs with such labels often appear too
"busy" by modern tastes. 

Any views on this, including reports on tribal attitudes or 

[email protected] 

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index