Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nikos Kakouros <nkakouros@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: FW: st: uniform distribution |
Date | Sat, 9 Nov 2013 11:02:40 -0500 |
I have already stood corrected. To continue the appropriate finger pointing and contribution attribution, the approach of converting with the invnormal function was actually suggested by Fernando and further explained by David Hoaglin. My understanding is that Fernando's suggestion is an excellent and correct answer to the question of how to test for uniformity (assuming the min/max of the distribution are represented by the min/max of the sample and acknowledging the issue of the asymptotic values at 0 and 1...). What Nick Cox also pointed out, however, is that it's the question itself that is wrong... Nikos On Sat, Nov 9, 2013 at 10:44 AM, Nick Cox <njcoxstata@gmail.com> wrote: > I think this was suggested by Nikos Kakouros before he read my > comments. Either way, it was not suggested by me (Nick Cox, a > different contributor to the list) and I don't endorse it. In case > it's not clear, I consider this approach to be incorrect for the > reasons I identified earlier today. > Nick > njcoxstata@gmail.com > > > On 9 November 2013 15:37, PAPANIKOLAOU P. <P.Papanikolaou@swansea.ac.uk> wrote: >> Dear All, >> Thank you so much to you all for providing interesting views regarding >> checking whether the data follow the uniform distribution. >> Following through the discussion, I have noticed that Nick has put >> forward a script alongside these lines, modified to my case, which is >> presented just now. >> >> sum mpg >> gen mpg_s=(mpg-r(min)) / (r(max)-r(min)) * transform the variable into a >> normal, AND what r stands for? >> gen nick_recipe = (rank-0.5) / N * CREATE the variable that Nick >> suggests that the data should be weighted by rank-0.5 to ensure that >> they will cause indeterminate values at the zero and one in the inverse >> normal >> gen rank_mpg_s = mpg_s / nick_recipe * weigh the data by the variable >> suggested by Nick >> gen n_mpg_s = invnormal(rank_mpg_s) * take the inverse normal of this >> adjusted variable and use this VARIABLE for testing the normality >> assumption below >> sktest n_mpg_s HTH * WHAT HTH- that Nick wrote - stands for ? >> >> Through this script, the sktest would provide valid statistical evidence >> in favour independence of observations? >> In my case, I have got 2 variables, by running the above test, how >> would this script, if correct, ensure that it covers the independence >> assumption between the TWO variables? >> >> I would appreciate your input. >> Many thanks >> Panos >> >> >> >> -----Original Message----- >> From: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nikos >> Kakouros >> Sent: 09 November 2013 14:15 >> To: statalist@hsphsun2.harvard.edu >> Subject: Re: st: uniform distribution >> >> David, >> >> Thanks! That is a very neat property. >> Of course, I had to see it in action... ;-) set obs 50000 gen >> nnorm=rnormal(0,1) gen n_nnorm=normal(nnorm) histogram n_nnorm >> >> n_norm looks pretty uniform ;-) >> >> So it it starts non-uniform it will end up not quite so normal the other >> way around. I wonder however whether a test for a departure from >> normality for the Finv(U) can really accurately test for U's departure >> from uniformity. Will the p's be accurate? >> >> Nick Cox has, of course, in the meantime questioned the entire >> applicability of uniform distribution testing given the nature of the >> originally presented data (time series). >> >> Many thanks for explaining this nice property! >> >> Nikos >> >> On Sat, Nov 9, 2013 at 8:43 AM, David Hoaglin <dchoaglin@gmail.com> >> wrote: >>> Nikos, >>> >>> No approximation to the binomial distribution is involved. >>> >>> The approach uses a basic property of (continuous) probability >>> distributions. If X is an observation from a distribution whose >>> cumulative distribution function (c.d.f.) is F, then U = F(X) has a >>> uniform(0,1) distribution. This is, I am transforming X by using the >>> c.d.f. of its own distribution. This holds for any continuous >>> distribution, not just the normal distribution. >>> >>> The reverse of the above process starts with an observation U from >>> uniform(0,1) and transforms it by the inverse of the c.d.f. of the >>> particular distribution (call it Finv). Then X = Finv(U) is an >>> observation from the particular distribution. This is what Fernando >>> suggested. Of course, he did not assume that, when compressed onto >>> the interval [0,1], mpg would have a uniform distribution. The idea >>> is that a departure from uniformity will show up as a departure from >>> normality after transforming the uniformized data by invnorm. A >>> little problem may arise at the ends of the interval, though: >>> theoretically, invnorm(0) = minus infinity and invnorm(1) = infinity. >>> >>> People often make "probability plots" and handle that problem by using >> >>> "plotting positions" that do not go quite as low as 0 or as high as 1. >>> In making a probability plot (or "quantile-quantile plot") for a >>> sample of n observations vs. the uniform distribution, I would do the >>> following: >>> 1. Sort the observations from smallest to largest, index those with i >>> = 1 through i = n, and denote them by x(1), ..., x(n). >>> 2. Calculate the corresponding plotting positions from the formula >>> pp(i) = (i - (1/3))/(n + (1/3)). >>> 3. Make a scatterplot of the points (pp(i), x(i)). >>> 4. Assess departures from uniformity by comparing the pattern in that >>> plot against a straight line. >>> 5. To get a feel for how such plots look when the data are actually >>> uniform, simulate a number of samples of n from the uniform(0,1) >>> distribution and make that plot for each sample. >>> (Quantile-quantile plots for non-uniform distributions use the same >>> approach. They use Finv(pp(i)) as horizontal coordinate of the plot.) >>> >>> David Hoaglin >>> >>> On Sat, Nov 9, 2013 at 7:58 AM, Nikos Kakouros <nkakouros@gmail.com> >> wrote: >>>> Fernando, >>>> >>>> That seems to work pretty well (did a run below). >>>> I'm not entirely sure why it should work though. >>>> >>>> Is it because the normal distribution in this case works as an >>>> approximation to the binomial distribution? >>>> >>>> Nikos >>>> >>>> >>>> >>>> set obs 50000 >>>> gen test=runiform() >>>> sort test >>>> histogram test >>>> gen n_test=invnormal(test) >>>> histogram n_test, normal >>>> swilk n_test >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/