Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: RE: RE: truncating graph range


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: RE: RE: truncating graph range
Date   Wed, 2 Nov 2005 12:52:41 -0000

Next door to "Can you do X in Stata?" is 
"Should you do X anyway?". I focused on the first
in my previous replies, but the second is crucial
too. 

Nothing to follow affects the key facts that this 
is your report and you should know the audience, but 
I have comments on various levels. 

-1. What is Excel? 

0. Showing the data fully and honestly is by far 
the best strategy, and the reasons for not doing 
that should be clear and overwhelming. 

1. If your readers are confused by logs, one option 
is to explain them. ("The Economist" regularly uses 
log scales without apology.) The first graph could 
explain what log scales are. 

2. If your outliers are so extreme that you want 
to exclude them from many graphs, does the rest of 
your analysis cope with the outliers optimally? 

3. Some things can be done without low-level 
programming. I generated some spiky series 

clear
set seed 2803 
set obs 100
forval i = 1/10 { 
	gen y`i' = 1/uniform()
} 
su 
gen x = _n 

and then generated a series of graphs in 
uniform style. The idea was to omit values
above a threshold, but to show the values 
as text just above that threshold and to 
show the omission explicitly by a break 
in the line. This falls short of lines pointing
towards the outlier. 

You would need to choose your own threshold and might 
not need to set -mlabangle(vertical)-, which leads
to giraffe graphics, but it seems quite likely that in 
real data, even more than in random data, outliers 
may be next to each other. 

qui gen show = "" 
gen high = 105 
forval i = 1/10 { 
	clonevar temp = y`i' 
	qui replace temp = . if y`i' > 100 
	qui replace show = string(y`i', "%7.1g") if y`i' > 100 
	line temp x , cmissing(n) ysc(r(0,105)) || /// 
	scatter high x , ms(none) mlabpos(6) mla(show) mlabangle(vertical) ///
	legend(off) ytitle(y`i') 
	more
	qui replace show = "" 
	drop temp 
}

Nick 
n.j.cox@durham.ac.uk 

Timothy Dang
 
> Thanks Nick & Allan. I have a lot of graphs I'm going to need to put
> together for an appendix, and I was thinking that automating it with
> Stata would give me the uniform appearance I wanted relatively
> painlessly, except for that outlier problem. I'm pretty sure that in
> almost all the cases a log scale would be confusing to the reader, so
> I don't want to go with that.
> 
> Allan, I'll play with your suggestions and see if they do the trick
> for me. Otherwise, it may be graphing in Excel, which does this
> readily.
> 
> In the spirit of Nick's reminder on closing threads, I'll come back
> and report if there's a solution that worked for me.
> 
> Thanks!
> 
> On 10/31/05, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> > Allan's main advice is to fit a regression line with
> > an outlier and to show it on a graph together
> > with all the other data.
> >
> > That's often a useful technique, but I read
> > Timothy's discussion of line plots as wanting
> > something quite different, namely
> >
> >                                 *     outlier off graph
> >
> >                       /    \
> >                      /      \
> >                     /        \   lines on graph pointing to it
> >
> > I am sure that this is programmable, but I don't know an
> > easy and general way for a user to do it.
> >
> > Likewise Allan's other suggestions do not seem to bear
> > on this problem.
> >
> > Nick
> > n.j.cox@durham.ac.uk
> >
> > Allan Reese (Cefas)
> >
> > > Hate to disagree with Nick, but Stata is well-designed for
> > > intelligent graph editing.  Timothy maybe needs to fiddle
> > > with a few alternatives and work out what would show what he
> > > intends.  A log scale is one option but has many other 
> implications.
> > >
> > > For example, it's straightforward in Stata to draw lines
> > > with/without outliers.  Other "point'n'click" packages don't
> > > make this easy, so suppress the desire.
> > >
> > > fit y x
> > > predict yhat1
> > > fit y x if y<1000
> > > predict yhat2
> > > scatter y yhat1 yhat2 x if y<1000, connect(. l l) msym(o i i) sort
> > >
> > > Another simple trick is to copy one variable into several, so
> > > subsets can be distinguished on the plot.  You could automate
> > > this (eg, using egen to save the max value of x), but I'd
> > > usually do it as part of visual editing, for example to add
> > > text labels to the points at the end of each line. It's
> > > therefore feasible to draw a line for the data excluding the
> > > outlier, and add a second line in different style pointing up
> > > with a label at its end describing the outlier.
> > >
> > > This is the type of work where I'd draft commands in a DO
> > > file so they are easily modified and re-run.
> >
> > Nick Cox
> >
> > > What you want is _not_ straightforward. I know no easy and also
> > > general way of omitting a data point from a Stata graph and also
> > > having it exert some offstage influence on the remainder of
> > > the graph.
> > >
> > > In my experience, when people think they want something like
> > > this using a logarithmic scale for the variable concerned is
> > > usually the
> > > best way forward.
> >
> > Timothy Dang
> >
> > > > I'm making a lot of (line) plots in Stata, and mostly 
> it's working
> > > > great, but I've hit a snag. For a few of my data sets,
> > > there are some
> > > > data points which are extraordinarily high. With the 
> automatically
> > > > scaled axis ranges, these points are visible, but all 
> the detail of
> > > > the rest of the data is shrunk to invisibility.
> > > >
> > > > So, I want to:
> > > > (a) enforce a maximum for the axis, hopefully showing the
> > > lines going
> > > > up towards some point not shown on the plot, and
> > > > (b) add some text describing what happens at those 
> points (I can do
> > > > this outside Stata if needed).
> > > >
> > > > Hopefully this is straightforward and I've just missed 
> something.
> > > > Thanks for any pointers.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index