David Hoaglin <[email protected]>
Thu, 14 Nov 2013 07:17:16 -0500
For plotting positions, I prefer (i - (1/3))/(n + (1/3)). John Tukey
introduced these after analyzing the sampling distributions of the
order statistics in a sample of n from the uniform distribution on
(0,1). The expression above is a good approximation for the median of
the sampling distribution of the i-th order statistic in such a sample
(a slight modification improves the approximation when i = 1 and i =
n). In a Q-Q plot against a distribution with c.d.f. F, the plotting
positions (from any definition) are transformed by F-inverse. Since
monotonic transformations preserve medians, the transformed plotting
positions are good approximations for the medians of the sampling
distributions of the order statistics of a sample from the chosen
David Hoaglin
On Thu, Nov 14, 2013 at 4:21 AM, Nick Cox <[email protected]> wrote:
> -distplot- (SJ), -cdfplot- (STB originally, SSC now): as always,
> please explain the origin of the user-written commands you refer to.
> -qplot- (SJ) can do this, roughly.
> . sysuse auto
> (1978 Automobile Data)
> . qplot turn trunk, trscale(invnormal(@))
> . qplot turn trunk, trscale(invnormal(@)) xtitle(standard normal
> deviate) xla(-2/2)
> The axes are the other way round from what you ask; I'd argue that is
> better practice, or at least consistent with -qnorm-. (-ysc(log)- is
> also possible.)
> Note that you should not expect cumulative distribution plots to do
> this by default as they usually plot cumulative probabilities as 1/n,
> ..., n/n and -invormal(n/n)- is -invnormal(1)- and as such
> indeteminate.
> But it is as easy to do this pretty much from first principles. See e.g.
> I will cheat slightly and use -mylabels- (SSC).
> Here is some code. Any number of possible small variations should be evident.
> sysuse auto, clear
> replace price = price/1000
> foreach v in price mpg {
> egen y`v' = rank(`v')
> su `v', meanonly
> replace y`v' = invnormal((y`v' - 0.5) / r(N))
> label var y`v' "`: var label `v''"
> }
> mylabels 1 5 10(10)90 95 99, myscale(invnormal(@/100)) local(labels)
> twoway connect yprice price, ms(Dh) sort || ///
> connect ympg mpg, sort ms(Th) xsc(log) yla(`labels', ang(h)) xla(5 10 20 40) ///
> ytitle(Cumulative percent)
> Nick
