Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: uniform distribution


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: uniform distribution
Date   Sat, 9 Nov 2013 15:23:10 +0000

Quite so. Thanks for the correction.

Nick
[email protected]


On 9 November 2013 13:40, Nikos Kakouros <[email protected]> wrote:
> Dear Nick,
>
> Incisive as always! Uniformity is certainly unnatural!
> The issues of the assumed constraints, indeterminate 0 and 1 due to
> asymptotic normal function and the importance of value order are real
> fudges...
> My only comment would be that the scaling is by (value - min) /(max -
> min) ... :) Just pulling your leg.
>
> Nikos
>
>
> On Sat, Nov 9, 2013 at 8:31 AM, Nick Cox <[email protected]> wrote:
>> Let's take this more slowly. It looks like a case of answering a
>> poster's question when the real problem is otherwise.
>>
>> 1. I would be interested to learn of examples to the contrary, but the
>> hypothesis of a uniform distribution (unqualified) does not seem arise
>> naturally. In contrast, the hypothesis that a variable is uniform on
>> some interval [a, b] does arise and in that case a, b are known
>> constants that follow from the nature of the variable.
>>
>> 2. Panos wants to scale values by (value - max) / (max - min) to [0,1]
>> which amounts to arguing that the uniform being tested for has known
>> extremes, namely the sample extremes. That needs a story.
>>
>> 3. Panos wants to plug the scaled values into -invnormal()-. However,
>> -invnormal(0)- and -invnormal(1)- are indeterminate. Usually when
>> people plug in probabilities into -invnormal()- they ensure that the
>> arguments belong to (0,1), e.g. by using a recipe such as (rank - 0.5)
>> / sample size.
>>
>> 4. Panos's examples are time series
>>
>> MONTH  MS_COHO     UK_MS
>> Apri        396      62986
>> Aug        330      67503
>> Dec        342      65218
>> Feb        348   59491.83
>> Jan        379   65502.33
>> Jul        377    68214.5
>> Jun        368   65511.33
>> Mar        419   65112.17|
>> May        423   66152.34
>> Nov        328   65107.67
>> Oct        347   68344.16
>> Sep        356   67597.34
>>
>> What these variables are is not made clear, but my guess is not the
>> problem is not about testing uniformity of distribution at all, but
>> about testing for seasonality, which is a quite different problem.
>> Ignoring the serial order is pointless in that case; it is a vital
>> part of the information.
>>
>> 5. Regardless of whether that guess about the real problem is correct,
>> Panos can't assume _independence_ of observations willy-nilly; that is
>> an assumption that has to be justified.
>>
>> Whatever the answer to (4) a P-value from e.g. Shapiro-Wilk can't be
>> taken very seriously here because of the fudges involved in
>> translating the original problem to a quite different one.
>>
>> Nick
>> [email protected]
>>
>>
>> On 9 November 2013 12:58, Nikos Kakouros <[email protected]> wrote:
>>> Fernando,
>>>
>>> That seems to work pretty well (did a run below).
>>> I'm not entirely sure why it should work though.
>>>
>>> Is it because the normal distribution in this case works as an
>>> approximation to the binomial distribution?
>>>
>>> Nikos
>>>
>>>
>>>
>>> set obs 50000
>>> gen test=runiform()
>>> sort test
>>> histogram test
>>> gen n_test=invnormal(test)
>>> histogram  n_test, normal
>>> swilk  n_test
>>>
>>>
>>>
>>> On Fri, Nov 8, 2013 at 3:58 PM, Fernando Rios Avila <[email protected]> wrote:
>>>> What about standardizing the variable toward an index from 0 to 1.
>>>> say:
>>>> sum mpg
>>>> gen mpg_s=(mpg-r(min))/(r(max)-r(min))
>>>> Transform it into a normal
>>>> gen n_mpg_s=invnormal(mpg_s)
>>>> and then make a normality test of this variable
>>>> sktest n_mpg_s
>>>> HTH
>>>> Fernando
>>>>
>>>> On Fri, Nov 8, 2013 at 3:53 PM, Nick Cox <[email protected]> wrote:
>>>>> -egen, count()- on a variable just puts a constant in a variable,
>>>>> namely the sum of non-missing values, which is useless for your
>>>>> purpose.
>>>>>
>>>>> The best test of uniformity is graphical: -quantile- by accident if
>>>>> not design yields the appropriate graph. Otherwise think of
>>>>> chi-square, Kolmogorov-Smirnov, etc.
>>>>>
>>>>> For "STATA" read "Stata".
>>>>>
>>>>> Nick
>>>>> [email protected]
>>>>>
>>>>>
>>>>> On 8 November 2013 18:09, PAPANIKOLAOU P. <[email protected]> wrote:
>>>>>
>>>>>> I am a fairly new user to STATA. I have got to check whether each of
>>>>>> these two variables (column  2: MS_COHO; column 3: UK_MS) follow the
>>>>>> uniform distribution.
>>>>>> For each for them, I used the following code, properly adjusted:
>>>>>>
>>>>>> egen n = count (mpg)  // use MS_COHO and UK_MS each time ... drop n i
>>>>>> surprisingly, the results were identical in both attempts, though the
>>>>>> script was applied to two different variables.
>>>>>> MONTH  MS_COHO     UK_MS
>>>>>> Apri        396      62986 |
>>>>>> Aug        330      67503 |
>>>>>> Dec        342      65218 |
>>>>>> Feb        348   59491.83 |
>>>>>> Jan        379   65502.33 |
>>>>>> Jul        377    68214.5 |
>>>>>> Jun        368   65511.33 |
>>>>>> Mar        419   65112.17 |
>>>>>> May        423   66152.34 |
>>>>>> Nov        328   65107.67 |
>>>>>> Oct        347   68344.16 |
>>>>>> Sep        356   67597.34
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index