Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: uniform distribution
From
Nikos Kakouros <[email protected]>
To
[email protected]
Subject
Re: st: uniform distribution
Date
Sat, 9 Nov 2013 08:40:53 -0500
Dear Nick,
Incisive as always! Uniformity is certainly unnatural!
The issues of the assumed constraints, indeterminate 0 and 1 due to
asymptotic normal function and the importance of value order are real
fudges...
My only comment would be that the scaling is by (value - min) /(max -
min) ... :) Just pulling your leg.
Nikos
On Sat, Nov 9, 2013 at 8:31 AM, Nick Cox <[email protected]> wrote:
> Let's take this more slowly. It looks like a case of answering a
> poster's question when the real problem is otherwise.
>
> 1. I would be interested to learn of examples to the contrary, but the
> hypothesis of a uniform distribution (unqualified) does not seem arise
> naturally. In contrast, the hypothesis that a variable is uniform on
> some interval [a, b] does arise and in that case a, b are known
> constants that follow from the nature of the variable.
>
> 2. Panos wants to scale values by (value - max) / (max - min) to [0,1]
> which amounts to arguing that the uniform being tested for has known
> extremes, namely the sample extremes. That needs a story.
>
> 3. Panos wants to plug the scaled values into -invnormal()-. However,
> -invnormal(0)- and -invnormal(1)- are indeterminate. Usually when
> people plug in probabilities into -invnormal()- they ensure that the
> arguments belong to (0,1), e.g. by using a recipe such as (rank - 0.5)
> / sample size.
>
> 4. Panos's examples are time series
>
> MONTH MS_COHO UK_MS
> Apri 396 62986
> Aug 330 67503
> Dec 342 65218
> Feb 348 59491.83
> Jan 379 65502.33
> Jul 377 68214.5
> Jun 368 65511.33
> Mar 419 65112.17|
> May 423 66152.34
> Nov 328 65107.67
> Oct 347 68344.16
> Sep 356 67597.34
>
> What these variables are is not made clear, but my guess is not the
> problem is not about testing uniformity of distribution at all, but
> about testing for seasonality, which is a quite different problem.
> Ignoring the serial order is pointless in that case; it is a vital
> part of the information.
>
> 5. Regardless of whether that guess about the real problem is correct,
> Panos can't assume _independence_ of observations willy-nilly; that is
> an assumption that has to be justified.
>
> Whatever the answer to (4) a P-value from e.g. Shapiro-Wilk can't be
> taken very seriously here because of the fudges involved in
> translating the original problem to a quite different one.
>
> Nick
> [email protected]
>
>
> On 9 November 2013 12:58, Nikos Kakouros <[email protected]> wrote:
>> Fernando,
>>
>> That seems to work pretty well (did a run below).
>> I'm not entirely sure why it should work though.
>>
>> Is it because the normal distribution in this case works as an
>> approximation to the binomial distribution?
>>
>> Nikos
>>
>>
>>
>> set obs 50000
>> gen test=runiform()
>> sort test
>> histogram test
>> gen n_test=invnormal(test)
>> histogram n_test, normal
>> swilk n_test
>>
>>
>>
>> On Fri, Nov 8, 2013 at 3:58 PM, Fernando Rios Avila <[email protected]> wrote:
>>> What about standardizing the variable toward an index from 0 to 1.
>>> say:
>>> sum mpg
>>> gen mpg_s=(mpg-r(min))/(r(max)-r(min))
>>> Transform it into a normal
>>> gen n_mpg_s=invnormal(mpg_s)
>>> and then make a normality test of this variable
>>> sktest n_mpg_s
>>> HTH
>>> Fernando
>>>
>>> On Fri, Nov 8, 2013 at 3:53 PM, Nick Cox <[email protected]> wrote:
>>>> -egen, count()- on a variable just puts a constant in a variable,
>>>> namely the sum of non-missing values, which is useless for your
>>>> purpose.
>>>>
>>>> The best test of uniformity is graphical: -quantile- by accident if
>>>> not design yields the appropriate graph. Otherwise think of
>>>> chi-square, Kolmogorov-Smirnov, etc.
>>>>
>>>> For "STATA" read "Stata".
>>>>
>>>> Nick
>>>> [email protected]
>>>>
>>>>
>>>> On 8 November 2013 18:09, PAPANIKOLAOU P. <[email protected]> wrote:
>>>>
>>>>> I am a fairly new user to STATA. I have got to check whether each of
>>>>> these two variables (column 2: MS_COHO; column 3: UK_MS) follow the
>>>>> uniform distribution.
>>>>> For each for them, I used the following code, properly adjusted:
>>>>>
>>>>> egen n = count (mpg) // use MS_COHO and UK_MS each time ... drop n i
>>>>> surprisingly, the results were identical in both attempts, though the
>>>>> script was applied to two different variables.
>>>>> MONTH MS_COHO UK_MS
>>>>> Apri 396 62986 |
>>>>> Aug 330 67503 |
>>>>> Dec 342 65218 |
>>>>> Feb 348 59491.83 |
>>>>> Jan 379 65502.33 |
>>>>> Jul 377 68214.5 |
>>>>> Jun 368 65511.33 |
>>>>> Mar 419 65112.17 |
>>>>> May 423 66152.34 |
>>>>> Nov 328 65107.67 |
>>>>> Oct 347 68344.16 |
>>>>> Sep 356 67597.34
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/