Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FW: st: uniform distribution


From   "PAPANIKOLAOU P." <[email protected]>
To   <[email protected]>
Subject   FW: st: uniform distribution
Date   Sat, 9 Nov 2013 15:37:29 -0000

Dear All, 
Thank you so much to you all for providing interesting views regarding
checking whether  the data follow the uniform distribution.
Following through the discussion, I have noticed that Nick has put
forward a script alongside these lines, modified to my case, which is
presented just now. 

sum mpg
gen mpg_s=(mpg-r(min)) / (r(max)-r(min)) * transform the variable into a
normal, AND what r stands for?
gen nick_recipe = (rank-0.5) / N  * CREATE the variable that Nick
suggests that the data should be weighted by rank-0.5 to ensure that
they will cause indeterminate values at the zero and one in the inverse
normal
gen rank_mpg_s = mpg_s / nick_recipe * weigh the data by the variable
suggested by Nick
gen n_mpg_s = invnormal(rank_mpg_s) * take the inverse normal of this
adjusted variable and use this VARIABLE for testing the normality
assumption below
sktest n_mpg_s HTH * WHAT HTH- that Nick wrote - stands for ?

Through this script, the sktest would provide valid statistical evidence
in favour independence of observations?
 In my case, I have got 2 variables, by running the above test, how
would this script, if correct, ensure that it covers the independence
assumption between the TWO variables?

I would appreciate your input.
Many thanks
Panos



-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nikos
Kakouros
Sent: 09 November 2013 14:15
To: [email protected]
Subject: Re: st: uniform distribution

David,

Thanks! That is a very neat property.
Of course, I had to see it in action...  ;-) set obs 50000 gen
nnorm=rnormal(0,1) gen n_nnorm=normal(nnorm) histogram n_nnorm

n_norm looks pretty uniform ;-)

So it it starts non-uniform it will end up not quite so normal the other
way around. I wonder however whether a test for a departure from
normality for the Finv(U) can really accurately test for U's departure
from uniformity. Will the p's be accurate?

Nick Cox has, of course, in the meantime questioned the entire
applicability of uniform distribution testing given the nature of the
originally presented data (time series).

Many thanks for explaining this nice property!

Nikos

On Sat, Nov 9, 2013 at 8:43 AM, David Hoaglin <[email protected]>
wrote:
> Nikos,
>
> No approximation to the binomial distribution is involved.
>
> The approach uses a basic property of (continuous) probability 
> distributions.  If X is an observation from a distribution whose 
> cumulative distribution function (c.d.f.) is F, then U = F(X) has a
> uniform(0,1) distribution.  This is, I am transforming X by using the 
> c.d.f. of its own distribution.  This holds for any continuous 
> distribution, not just the normal distribution.
>
> The reverse of the above process starts with an observation U from
> uniform(0,1) and transforms it by the inverse of the c.d.f. of the 
> particular distribution (call it Finv).  Then X = Finv(U) is an 
> observation from the particular distribution.  This is what Fernando 
> suggested.  Of course, he did not assume that, when compressed onto 
> the interval [0,1], mpg would have a uniform distribution.  The idea 
> is that a departure from uniformity will show up as a departure from 
> normality after transforming the uniformized data by invnorm.  A 
> little problem may arise at the ends of the interval, though:
> theoretically, invnorm(0) = minus infinity and invnorm(1) = infinity.
>
> People often make "probability plots" and handle that problem by using

> "plotting positions" that do not go quite as low as 0 or as high as 1.
>  In making a probability plot (or "quantile-quantile plot") for a 
> sample of n observations vs. the uniform distribution, I would do the
> following:
> 1. Sort the observations from smallest to largest, index those with i 
> = 1 through i = n, and denote them by x(1), ..., x(n).
> 2. Calculate the corresponding plotting positions from the formula
> pp(i) = (i - (1/3))/(n + (1/3)).
> 3. Make a scatterplot of the points (pp(i), x(i)).
> 4. Assess departures from uniformity by comparing the pattern in that 
> plot against a straight line.
> 5. To get a feel for how such plots look when the data are actually 
> uniform, simulate a number of samples of n from the uniform(0,1) 
> distribution and make that plot for each sample.
> (Quantile-quantile plots for non-uniform distributions use the same 
> approach.  They use Finv(pp(i)) as horizontal coordinate of the plot.)
>
> David Hoaglin
>
> On Sat, Nov 9, 2013 at 7:58 AM, Nikos Kakouros <[email protected]>
wrote:
>> Fernando,
>>
>> That seems to work pretty well (did a run below).
>> I'm not entirely sure why it should work though.
>>
>> Is it because the normal distribution in this case works as an 
>> approximation to the binomial distribution?
>>
>> Nikos
>>
>>
>>
>> set obs 50000
>> gen test=runiform()
>> sort test
>> histogram test
>> gen n_test=invnormal(test)
>> histogram  n_test, normal
>> swilk  n_test
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index