Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | David Hoaglin <dchoaglin@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: uniform distribution |
Date | Sat, 9 Nov 2013 08:43:23 -0500 |
Nikos, No approximation to the binomial distribution is involved. The approach uses a basic property of (continuous) probability distributions. If X is an observation from a distribution whose cumulative distribution function (c.d.f.) is F, then U = F(X) has a uniform(0,1) distribution. This is, I am transforming X by using the c.d.f. of its own distribution. This holds for any continuous distribution, not just the normal distribution. The reverse of the above process starts with an observation U from uniform(0,1) and transforms it by the inverse of the c.d.f. of the particular distribution (call it Finv). Then X = Finv(U) is an observation from the particular distribution. This is what Fernando suggested. Of course, he did not assume that, when compressed onto the interval [0,1], mpg would have a uniform distribution. The idea is that a departure from uniformity will show up as a departure from normality after transforming the uniformized data by invnorm. A little problem may arise at the ends of the interval, though: theoretically, invnorm(0) = minus infinity and invnorm(1) = infinity. People often make "probability plots" and handle that problem by using "plotting positions" that do not go quite as low as 0 or as high as 1. In making a probability plot (or "quantile-quantile plot") for a sample of n observations vs. the uniform distribution, I would do the following: 1. Sort the observations from smallest to largest, index those with i = 1 through i = n, and denote them by x(1), ..., x(n). 2. Calculate the corresponding plotting positions from the formula pp(i) = (i - (1/3))/(n + (1/3)). 3. Make a scatterplot of the points (pp(i), x(i)). 4. Assess departures from uniformity by comparing the pattern in that plot against a straight line. 5. To get a feel for how such plots look when the data are actually uniform, simulate a number of samples of n from the uniform(0,1) distribution and make that plot for each sample. (Quantile-quantile plots for non-uniform distributions use the same approach. They use Finv(pp(i)) as horizontal coordinate of the plot.) David Hoaglin On Sat, Nov 9, 2013 at 7:58 AM, Nikos Kakouros <nkakouros@gmail.com> wrote: > Fernando, > > That seems to work pretty well (did a run below). > I'm not entirely sure why it should work though. > > Is it because the normal distribution in this case works as an > approximation to the binomial distribution? > > Nikos > > > > set obs 50000 > gen test=runiform() > sort test > histogram test > gen n_test=invnormal(test) > histogram n_test, normal > swilk n_test * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/