Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: logarithmic scales

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: logarithmic scales
Date   Mon, 1 Dec 2003 15:31:34 -0000

My question was about default scales, and
about "nice numbers", but Roger
here usefully broadens the discussion to how
to customise your own logarithmic scales.
Defining a scale as the union of two or
more scales is certainly a nice idea.

Beyond that, some minor comments only:

1. Roger refers in passing to my -listutil-
package. That package lost a large fraction
of its rationale when -forval- and -foreach-
were introduced in 7 (and another large fraction
when the stuff documented under -macrolists- was
introduced in 8). In the same sort of way,
there is a trade-off between using specific tools
such as -explist,
which one has to look up if only using occasionally,
and general tools which are more likely to be
used frequently and therefore remembered.

Hence another way of getting 0.75 * 2^i, i=0(1)20,

forval i = 0/20 {
	local list "`list' `=0.75 * 2^`i''"

And this extends to multiple scales

forval i = 0/2 {
	local list "`list' `=1000*2^`i'' `=2000*2^`i''"

Let me stress: there is a trade-off here. People
not accustomed to local macros, especially nested
local macros, are likely to find the format of Roger's -explist-
more congenial (and it's certainly very clear).
No issue there; just be aware of alternatives.

2. (Quite different issue, but it's mentioned now)

Specifically, as Roger knows, I'm not
happy with log P-value as a _graphical_ scale.

It is perhaps easier here to think in terms
of base 10 logarithms, not that the issue
is any different. Concretely,

P = 1              log P = 0
P = 0.05           log P = -1.3
P = 0.01           log P = -2
P = 0.001          log P = -3
P = 1 in 10^6      log P = -6
P = 1 in 10^20     log P = -20

In my experience, it's not that extraordinary
to get a range of P-values such that, plotted
on a log scale, (1) interesting marginal P-values
are crowded into the bottom of the graph (Roger
understandably uses a reverse scale) and (2) a
large fraction of the space is dedicated
to a range one can't really think about (i.e.
make discriminations within that interval).

I'd assert, perhaps very rashly, that beyond
some threshold, very low P-values are
practically indistinguishable. I suppose that
log P-value of -20 is often appealing as a kind of
thermonuclear demolition of a null hypothesis, but I wonder
if anyone would think differently of (say) -6. Also,
as is well known, the further you go out into
the tail the more you depend on everything being
as it be (model assumptions, data without
measurement error, numerical analysis...).
On the other hand, there are situations
in which an overwhelming P-value is needed
for any ensuing decision.

It is difficult to know of a general
alternative. One possibility is
a Bond scale: show not log p,
but  p^0.07

  |        p     p^0.07 |
  |        1          1 |
  |      .05   .8108264 |
  |      .01    .724436 |
  |     .005   .6901252 |
  |     .001    .616595 |
  | 1.00e-06   .3801894 |
  | 1.00e-10   .1995262 |
  | 1.00e-20   .0398107 |

3. Going for floor(log(min)) and ceil(log(max))
can at worst leave almost 2/3 of a graph axis
corresponding to regions without data.
For example, suppose log(min) is just below
1 and log(max) just above 2; hence these rules
extend the axes to 0 and 3 on a log scale. As
said, that's a worst case, but this rule often
leaves lots of blank space.

[email protected]

Roger Newson

> At 16:04 27/11/03 +0000, Nick Cox wrote:
> >Here's a small issue which is utterly elementary, but it may
> >provide a moment's bemusement, and I'd welcome any comments.
> >
> >When drawing graphs with one or both axes on a logarithmic scale,
> >Stata by default tries to provide "nice" labels, just as usual.
> >
> >I'm writing a graphics program for a kind of plot in which one
> >axis will _always_ show a logarithmic scale, yet I find that
> >Stata's default for labels often gives me what is both a very
> >sensible and a very poor answer. The fault is possibly as much
> >mine as Stata's, as I may be quirky in what I most often want.
> >Also, despite years of acquaintance, Stata can't know what I
> >want unless I tell it.
> >
> .
> .
> .
> >Any views on this, including reports on tribal attitudes or
> >customs?
> I too have had a similar experience to Nick's, while writing my own
> -smileplot- package (downloadable from SSC). My personal
> custom is to use
> my own -explist- package (downloadable from SSC and
> modelled broadly on
> Nick Cox's -listutil- suite) to define an exponentially
> spaced list of
> numbers, with a logarithmic base defined by the -base()-
> option and a scale
> defined by the -scale()- option. -explist- takes, as input,
> a list of
> numbers x_1, ... , x_n, and creates, as output, a list of
> numbers y_1, ...
> , y_n, derived exponentially from the input list, so that,
> for each i,
> y_i = scale*base^x_i
> Therefore, if I want a scale that starts at 0.75 and has a
> tick at each
> doubling, then I might type
> . explist 0(1)10, scale(0.75) base(2)
> . retu list
> macros:
>             r(explist) : ".75 1.5 3 6 12 24 48 96 192 384 768"
> If I want the scale to be a bit busier, then I might use
> multiple -explist-
> lists with the same -base()- option and different -scale()-
> options, and
> concatenate them in the -xlabel()- option of -graph twoway-
> as follows:
> . sysuse auto, clear
> (1978 Automobile Data)
> . explist 0(1)2, base(2) scale(1000) global(g1)
> . explist 0(1)2, base(2) scale(1500) global(g2)
> . scatter length weight, xscale(log) yscale(log) xlab($g1 $g2)
> and the X-axis labels will be -xlab(1000 1500 2000 3000 4000 6000)-.
> This leaves the question of choosing the base, scale and
> input list for
> -explist-. In the case of -smileplot-, the Y-axis variable
> is a P-value, so
> I set the logarithmic scale to start from one and to end at
> ceil(-log(pmin)/log(base)), where -pmin- is the smallest
> P-value. The
> increments are chosen, by default, to keep the number of
> labels to no nore
> than 25 (which was the largest number allowed by Stata 7).
> In the general
> case, it is probably a good idea to use the -ceil()-
> function on the
> maximum log (to the chosen base) and the -floor()- function
> on the minimum
> log (to the same chosen base) to define the top and bottom
> of the scale,
> and to choose the scale(s) and input list to optimise the
> number of tick
> marks or labels by some criterion.

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index