Re: st: uniform distribution

From   Nikos Kakouros <[email protected]>
To   [email protected]
Subject   Re: st: uniform distribution
Date   Sat, 9 Nov 2013 08:40:53 -0500

Dear Nick,

Incisive as always! Uniformity is certainly unnatural!
The issues of the assumed constraints, indeterminate 0 and 1 due to
asymptotic normal function and the importance of value order are real
My only comment would be that the scaling is by (value - min) /(max -
min) ... :) Just pulling your leg.


On Sat, Nov 9, 2013 at 8:31 AM, Nick Cox <[email protected]> wrote:
> Let's take this more slowly. It looks like a case of answering a
> poster's question when the real problem is otherwise.
> 1. I would be interested to learn of examples to the contrary, but the
> hypothesis of a uniform distribution (unqualified) does not seem arise
> naturally. In contrast, the hypothesis that a variable is uniform on
> some interval [a, b] does arise and in that case a, b are known
> constants that follow from the nature of the variable.
> 2. Panos wants to scale values by (value - max) / (max - min) to [0,1]
> which amounts to arguing that the uniform being tested for has known
> extremes, namely the sample extremes. That needs a story.
> 3. Panos wants to plug the scaled values into -invnormal()-. However,
> -invnormal(0)- and -invnormal(1)- are indeterminate. Usually when
> people plug in probabilities into -invnormal()- they ensure that the
> arguments belong to (0,1), e.g. by using a recipe such as (rank - 0.5)
> / sample size.
> 4. Panos's examples are time series
> Apri        396      62986
> Aug        330      67503
> Dec        342      65218
> Feb        348   59491.83
> Jan        379   65502.33
> Jul        377    68214.5
> Jun        368   65511.33
> Mar        419   65112.17|
> May        423   66152.34
> Nov        328   65107.67
> Oct        347   68344.16
> Sep        356   67597.34
> What these variables are is not made clear, but my guess is not the
> problem is not about testing uniformity of distribution at all, but
> about testing for seasonality, which is a quite different problem.
> Ignoring the serial order is pointless in that case; it is a vital
> part of the information.
> 5. Regardless of whether that guess about the real problem is correct,
> Panos can't assume _independence_ of observations willy-nilly; that is
> an assumption that has to be justified.
> Whatever the answer to (4) a P-value from e.g. Shapiro-Wilk can't be
> taken very seriously here because of the fudges involved in
> translating the original problem to a quite different one.
> Nick
> [email protected]
> On 9 November 2013 12:58, Nikos Kakouros <[email protected]> wrote:
>> Fernando,
>> That seems to work pretty well (did a run below).
>> I'm not entirely sure why it should work though.
>> Is it because the normal distribution in this case works as an
>> approximation to the binomial distribution?
>> Nikos
>> set obs 50000
>> gen test=runiform()
>> sort test
>> histogram test
>> gen n_test=invnormal(test)
>> histogram  n_test, normal
>> swilk  n_test
>> On Fri, Nov 8, 2013 at 3:58 PM, Fernando Rios Avila <[email protected]> wrote:
>>> What about standardizing the variable toward an index from 0 to 1.
>>> say:
>>> sum mpg
>>> gen mpg_s=(mpg-r(min))/(r(max)-r(min))
>>> Transform it into a normal
>>> gen n_mpg_s=invnormal(mpg_s)
>>> and then make a normality test of this variable
>>> sktest n_mpg_s
>>> HTH
>>> Fernando
>>> On Fri, Nov 8, 2013 at 3:53 PM, Nick Cox <[email protected]> wrote:
>>>> -egen, count()- on a variable just puts a constant in a variable,
>>>> namely the sum of non-missing values, which is useless for your
>>>> purpose.
>>>> The best test of uniformity is graphical: -quantile- by accident if
>>>> not design yields the appropriate graph. Otherwise think of
>>>> chi-square, Kolmogorov-Smirnov, etc.
>>>> For "STATA" read "Stata".
>>>> Nick
>>>> [email protected]
>>>> On 8 November 2013 18:09, PAPANIKOLAOU P. <[email protected]> wrote:
>>>>> I am a fairly new user to STATA. I have got to check whether each of
>>>>> these two variables (column  2: MS_COHO; column 3: UK_MS) follow the
>>>>> uniform distribution.
>>>>> For each for them, I used the following code, properly adjusted:
>>>>> egen n = count (mpg)  // use MS_COHO and UK_MS each time ... drop n i
>>>>> surprisingly, the results were identical in both attempts, though the
>>>>> script was applied to two different variables.
>>>>> MONTH  MS_COHO     UK_MS
>>>>> Apri        396      62986 |
>>>>> Aug        330      67503 |
>>>>> Dec        342      65218 |
>>>>> Feb        348   59491.83 |
>>>>> Jan        379   65502.33 |
>>>>> Jul        377    68214.5 |
>>>>> Jun        368   65511.33 |
>>>>> Mar        419   65112.17 |
>>>>> May        423   66152.34 |
>>>>> Nov        328   65107.67 |
>>>>> Oct        347   68344.16 |
>>>>> Sep        356   67597.34
