Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FW: st: uniform distribution

From	Nikos Kakouros <[email protected]>
To	[email protected]
Subject	Re: FW: st: uniform distribution
Date	Sat, 9 Nov 2013 11:02:40 -0500

I have already stood corrected.
To continue the appropriate finger pointing and contribution
attribution, the approach of converting with the invnormal function
was actually suggested by Fernando and further explained by David
Hoaglin.
My understanding is that Fernando's suggestion is an excellent and
correct answer to the question of how to test for uniformity (assuming
the min/max of the distribution are represented by the min/max of the
sample and acknowledging the issue of the asymptotic values at 0 and
1...).

What Nick Cox also pointed out, however,  is that it's the question
itself that is wrong...
Nikos


On Sat, Nov 9, 2013 at 10:44 AM, Nick Cox <[email protected]> wrote:
> I think this was suggested by Nikos Kakouros before he read my
> comments. Either way, it was not suggested by me (Nick Cox, a
> different contributor to the list) and I don't endorse it. In case
> it's not clear, I consider this approach to be incorrect for the
> reasons I identified earlier today.
> Nick
> [email protected]
>
>
> On 9 November 2013 15:37, PAPANIKOLAOU P. <[email protected]> wrote:
>> Dear All,
>> Thank you so much to you all for providing interesting views regarding
>> checking whether  the data follow the uniform distribution.
>> Following through the discussion, I have noticed that Nick has put
>> forward a script alongside these lines, modified to my case, which is
>> presented just now.
>>
>> sum mpg
>> gen mpg_s=(mpg-r(min)) / (r(max)-r(min)) * transform the variable into a
>> normal, AND what r stands for?
>> gen nick_recipe = (rank-0.5) / N  * CREATE the variable that Nick
>> suggests that the data should be weighted by rank-0.5 to ensure that
>> they will cause indeterminate values at the zero and one in the inverse
>> normal
>> gen rank_mpg_s = mpg_s / nick_recipe * weigh the data by the variable
>> suggested by Nick
>> gen n_mpg_s = invnormal(rank_mpg_s) * take the inverse normal of this
>> adjusted variable and use this VARIABLE for testing the normality
>> assumption below
>> sktest n_mpg_s HTH * WHAT HTH- that Nick wrote - stands for ?
>>
>> Through this script, the sktest would provide valid statistical evidence
>> in favour independence of observations?
>>  In my case, I have got 2 variables, by running the above test, how
>> would this script, if correct, ensure that it covers the independence
>> assumption between the TWO variables?
>>
>> I would appreciate your input.
>> Many thanks
>> Panos
>>
>>
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Nikos
>> Kakouros
>> Sent: 09 November 2013 14:15
>> To: [email protected]
>> Subject: Re: st: uniform distribution
>>
>> David,
>>
>> Thanks! That is a very neat property.
>> Of course, I had to see it in action...  ;-) set obs 50000 gen
>> nnorm=rnormal(0,1) gen n_nnorm=normal(nnorm) histogram n_nnorm
>>
>> n_norm looks pretty uniform ;-)
>>
>> So it it starts non-uniform it will end up not quite so normal the other
>> way around. I wonder however whether a test for a departure from
>> normality for the Finv(U) can really accurately test for U's departure
>> from uniformity. Will the p's be accurate?
>>
>> Nick Cox has, of course, in the meantime questioned the entire
>> applicability of uniform distribution testing given the nature of the
>> originally presented data (time series).
>>
>> Many thanks for explaining this nice property!
>>
>> Nikos
>>
>> On Sat, Nov 9, 2013 at 8:43 AM, David Hoaglin <[email protected]>
>> wrote:
>>> Nikos,
>>>
>>> No approximation to the binomial distribution is involved.
>>>
>>> The approach uses a basic property of (continuous) probability
>>> distributions.  If X is an observation from a distribution whose
>>> cumulative distribution function (c.d.f.) is F, then U = F(X) has a
>>> uniform(0,1) distribution.  This is, I am transforming X by using the
>>> c.d.f. of its own distribution.  This holds for any continuous
>>> distribution, not just the normal distribution.
>>>
>>> The reverse of the above process starts with an observation U from
>>> uniform(0,1) and transforms it by the inverse of the c.d.f. of the
>>> particular distribution (call it Finv).  Then X = Finv(U) is an
>>> observation from the particular distribution.  This is what Fernando
>>> suggested.  Of course, he did not assume that, when compressed onto
>>> the interval [0,1], mpg would have a uniform distribution.  The idea
>>> is that a departure from uniformity will show up as a departure from
>>> normality after transforming the uniformized data by invnorm.  A
>>> little problem may arise at the ends of the interval, though:
>>> theoretically, invnorm(0) = minus infinity and invnorm(1) = infinity.
>>>
>>> People often make "probability plots" and handle that problem by using
>>
>>> "plotting positions" that do not go quite as low as 0 or as high as 1.
>>>  In making a probability plot (or "quantile-quantile plot") for a
>>> sample of n observations vs. the uniform distribution, I would do the
>>> following:
>>> 1. Sort the observations from smallest to largest, index those with i
>>> = 1 through i = n, and denote them by x(1), ..., x(n).
>>> 2. Calculate the corresponding plotting positions from the formula
>>> pp(i) = (i - (1/3))/(n + (1/3)).
>>> 3. Make a scatterplot of the points (pp(i), x(i)).
>>> 4. Assess departures from uniformity by comparing the pattern in that
>>> plot against a straight line.
>>> 5. To get a feel for how such plots look when the data are actually
>>> uniform, simulate a number of samples of n from the uniform(0,1)
>>> distribution and make that plot for each sample.
>>> (Quantile-quantile plots for non-uniform distributions use the same
>>> approach.  They use Finv(pp(i)) as horizontal coordinate of the plot.)
>>>
>>> David Hoaglin
>>>
>>> On Sat, Nov 9, 2013 at 7:58 AM, Nikos Kakouros <[email protected]>
>> wrote:
>>>> Fernando,
>>>>
>>>> That seems to work pretty well (did a run below).
>>>> I'm not entirely sure why it should work though.
>>>>
>>>> Is it because the normal distribution in this case works as an
>>>> approximation to the binomial distribution?
>>>>
>>>> Nikos
>>>>
>>>>
>>>>
>>>> set obs 50000
>>>> gen test=runiform()
>>>> sort test
>>>> histogram test
>>>> gen n_test=invnormal(test)
>>>> histogram  n_test, normal
>>>> swilk  n_test
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: uniform distribution
  - From: "PAPANIKOLAOU P." <[email protected]>
- Re: st: uniform distribution
  - From: Nick Cox <[email protected]>
- Re: st: uniform distribution
  - From: Fernando Rios Avila <[email protected]>
- Re: st: uniform distribution
  - From: Nikos Kakouros <[email protected]>
- Re: st: uniform distribution
  - From: David Hoaglin <[email protected]>
- Re: st: uniform distribution
  - From: Nikos Kakouros <[email protected]>
- RE: st: uniform distribution
  - From: "PAPANIKOLAOU P." <[email protected]>
- FW: st: uniform distribution
  - From: "PAPANIKOLAOU P." <[email protected]>
- Re: FW: st: uniform distribution
  - From: Nick Cox <[email protected]>

Prev by Date: st: FORVAL loop incomplete
Next by Date: Re: st: FORVAL loop incomplete
Previous by thread: Re: FW: st: uniform distribution
Next by thread: st: Strings dropped when importing Excel file
Index(es):
- Date
- Thread