[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: logarithmic scales |

Date |
Mon, 1 Dec 2003 15:31:34 -0000 |

My question was about default scales, and about "nice numbers", but Roger here usefully broadens the discussion to how to customise your own logarithmic scales. Defining a scale as the union of two or more scales is certainly a nice idea. Beyond that, some minor comments only: 1. Roger refers in passing to my -listutil- package. That package lost a large fraction of its rationale when -forval- and -foreach- were introduced in 7 (and another large fraction when the stuff documented under -macrolists- was introduced in 8). In the same sort of way, there is a trade-off between using specific tools such as -explist, which one has to look up if only using occasionally, and general tools which are more likely to be used frequently and therefore remembered. Hence another way of getting 0.75 * 2^i, i=0(1)20, is forval i = 0/20 { local list "`list' `=0.75 * 2^`i''" } And this extends to multiple scales forval i = 0/2 { local list "`list' `=1000*2^`i'' `=2000*2^`i''" } Let me stress: there is a trade-off here. People not accustomed to local macros, especially nested local macros, are likely to find the format of Roger's -explist- more congenial (and it's certainly very clear). No issue there; just be aware of alternatives. 2. (Quite different issue, but it's mentioned now) Specifically, as Roger knows, I'm not happy with log P-value as a _graphical_ scale. It is perhaps easier here to think in terms of base 10 logarithms, not that the issue is any different. Concretely, P = 1 log P = 0 P = 0.05 log P = -1.3 P = 0.01 log P = -2 P = 0.001 log P = -3 P = 1 in 10^6 log P = -6 P = 1 in 10^20 log P = -20 In my experience, it's not that extraordinary to get a range of P-values such that, plotted on a log scale, (1) interesting marginal P-values are crowded into the bottom of the graph (Roger understandably uses a reverse scale) and (2) a large fraction of the space is dedicated to a range one can't really think about (i.e. make discriminations within that interval). I'd assert, perhaps very rashly, that beyond some threshold, very low P-values are practically indistinguishable. I suppose that log P-value of -20 is often appealing as a kind of thermonuclear demolition of a null hypothesis, but I wonder if anyone would think differently of (say) -6. Also, as is well known, the further you go out into the tail the more you depend on everything being as it be (model assumptions, data without measurement error, numerical analysis...). On the other hand, there are situations in which an overwhelming P-value is needed for any ensuing decision. It is difficult to know of a general alternative. One possibility is a Bond scale: show not log p, but p^0.07 +---------------------+ | p p^0.07 | |---------------------| | 1 1 | | .05 .8108264 | | .01 .724436 | | .005 .6901252 | | .001 .616595 | |---------------------| | 1.00e-06 .3801894 | | 1.00e-10 .1995262 | | 1.00e-20 .0398107 | +---------------------+ 3. Going for floor(log(min)) and ceil(log(max)) can at worst leave almost 2/3 of a graph axis corresponding to regions without data. For example, suppose log(min) is just below 1 and log(max) just above 2; hence these rules extend the axes to 0 and 3 on a log scale. As said, that's a worst case, but this rule often leaves lots of blank space. Nick n.j.cox@durham.ac.uk Roger Newson > At 16:04 27/11/03 +0000, Nick Cox wrote: > >Here's a small issue which is utterly elementary, but it may > >provide a moment's bemusement, and I'd welcome any comments. > > > >When drawing graphs with one or both axes on a logarithmic scale, > >Stata by default tries to provide "nice" labels, just as usual. > > > >I'm writing a graphics program for a kind of plot in which one > >axis will _always_ show a logarithmic scale, yet I find that > >Stata's default for labels often gives me what is both a very > >sensible and a very poor answer. The fault is possibly as much > >mine as Stata's, as I may be quirky in what I most often want. > >Also, despite years of acquaintance, Stata can't know what I > >want unless I tell it. > > > . > . > . > >Any views on this, including reports on tribal attitudes or > >customs? > > I too have had a similar experience to Nick's, while writing my own > -smileplot- package (downloadable from SSC). My personal > custom is to use > my own -explist- package (downloadable from SSC and > modelled broadly on > Nick Cox's -listutil- suite) to define an exponentially > spaced list of > numbers, with a logarithmic base defined by the -base()- > option and a scale > defined by the -scale()- option. -explist- takes, as input, > a list of > numbers x_1, ... , x_n, and creates, as output, a list of > numbers y_1, ... > , y_n, derived exponentially from the input list, so that, > for each i, > > y_i = scale*base^x_i > > Therefore, if I want a scale that starts at 0.75 and has a > tick at each > doubling, then I might type > > > . explist 0(1)10, scale(0.75) base(2) > > . retu list > > macros: > r(explist) : ".75 1.5 3 6 12 24 48 96 192 384 768" > > > If I want the scale to be a bit busier, then I might use > multiple -explist- > lists with the same -base()- option and different -scale()- > options, and > concatenate them in the -xlabel()- option of -graph twoway- > as follows: > > > . sysuse auto, clear > (1978 Automobile Data) > > . explist 0(1)2, base(2) scale(1000) global(g1) > > . explist 0(1)2, base(2) scale(1500) global(g2) > > . scatter length weight, xscale(log) yscale(log) xlab($g1 $g2) > > and the X-axis labels will be -xlab(1000 1500 2000 3000 4000 6000)-. > > This leaves the question of choosing the base, scale and > input list for > -explist-. In the case of -smileplot-, the Y-axis variable > is a P-value, so > I set the logarithmic scale to start from one and to end at > ceil(-log(pmin)/log(base)), where -pmin- is the smallest > P-value. The > increments are chosen, by default, to keep the number of > labels to no nore > than 25 (which was the largest number allowed by Stata 7). > In the general > case, it is probably a good idea to use the -ceil()- > function on the > maximum log (to the chosen base) and the -floor()- function > on the minimum > log (to the same chosen base) to define the top and bottom > of the scale, > and to choose the scale(s) and input list to optimise the > number of tick > marks or labels by some criterion. > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: logarithmic scales***From:*Roger Newson <roger.newson@kcl.ac.uk>

**References**:**Re: st: logarithmic scales***From:*Roger Newson <roger.newson@kcl.ac.uk>

- Prev by Date:
**st: RE: regression and format output** - Next by Date:
**Re: st: Framingham equation** - Previous by thread:
**Re: st: logarithmic scales** - Next by thread:
**RE: st: logarithmic scales** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |