Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: CDF plot with normal probability axis


From   David Hoaglin <[email protected]>
To   [email protected]
Subject   Re: st: CDF plot with normal probability axis
Date   Thu, 14 Nov 2013 07:17:16 -0500

Nick,

For plotting positions, I prefer (i - (1/3))/(n + (1/3)).  John Tukey
introduced these after analyzing the sampling distributions of the
order statistics in a sample of n from the uniform distribution on
(0,1). The expression above is a good approximation for the median of
the sampling distribution of the i-th order statistic in such a sample
(a slight modification improves the approximation when i = 1 and i =
n).  In a Q-Q plot against a distribution with c.d.f. F, the plotting
positions (from any definition) are transformed by F-inverse.  Since
monotonic transformations preserve medians, the transformed plotting
positions are good approximations for the medians of the sampling
distributions of the order statistics of a sample from the chosen
distribution.

David Hoaglin

On Thu, Nov 14, 2013 at 4:21 AM, Nick Cox <[email protected]> wrote:
> -distplot- (SJ), -cdfplot- (STB originally, SSC now): as always,
> please explain the origin of the user-written commands you refer to.
>
> -qplot- (SJ) can do this, roughly.
>
> . sysuse auto
> (1978 Automobile Data)
>
> . qplot turn trunk, trscale(invnormal(@))
>
> . qplot turn trunk, trscale(invnormal(@)) xtitle(standard normal
> deviate) xla(-2/2)
>
> The axes are the other way round from what you ask; I'd argue that is
> better practice, or at least consistent with -qnorm-. (-ysc(log)- is
> also possible.)
>
> Note that you should not expect cumulative distribution plots to do
> this by default as they usually plot cumulative probabilities as 1/n,
> ..., n/n and -invormal(n/n)- is -invnormal(1)- and as such
> indeteminate.
>
> But it is as easy to do this pretty much from first principles. See e.g.
>
> http://www.stata.com/support/faqs/statistics/percentile-ranks-and-plotting-positions/index.html
>
> http://www.stata-journal.com/sjpdf.html?articlenum=gr0027
>
> http://www.stata-journal.com/sjpdf.html?articlenum=gr0032
>
> I will cheat slightly and use -mylabels- (SSC).
>
> Here is some code. Any number of possible small variations should be evident.
>
> sysuse auto, clear
>
> replace price = price/1000
>
> foreach v in price mpg {
>     egen y`v' = rank(`v')
>     su `v', meanonly
>     replace y`v' = invnormal((y`v' - 0.5) / r(N))
>     label var y`v' "`: var label `v''"
> }
>
> mylabels 1 5 10(10)90 95 99, myscale(invnormal(@/100)) local(labels)
>
> twoway connect yprice price, ms(Dh) sort || ///
> connect ympg mpg, sort ms(Th) xsc(log) yla(`labels', ang(h)) xla(5 10 20 40) ///
> ytitle(Cumulative percent)
>
> Nick
> [email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index