Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[no subject]

I am not sure where the 14 came from.  A result from the theory of
order statistics may be useful.  If we start with a sample of n
observations from the uniform distribution on (0, 1),  x1, x2, ...,
xn, and arrange the observations in nondecreasing order, we obtain the
order statistics of the sample, often denoted by x(1) <= x(2) <= ...
<= x(n).  For the ith order statistic in a sample of n from Uniform(0,
1), the average value is i/(n + 1).  See, for example, David and
Nagaraja (2003).  Thus, you could plot your ordered observations
against those values.  A reasonable alternative plotting value for
x(i) is (i - (1/3))/(n + (1/3)), which (except for i = 1 and i = n) is
a close approximation to the median of the sampling distribution of
x(i).

I don't recall seeing the values of your 15 observations, but the
output for -ksmirnov- seems to be saying that your sample contains
more small values than one would expect in an sample of 15 from
Uniform(0, 1) AND more large values than one would expect in such a
sample.

If you regard your sample as a population and draw single observations
randomly from it and from Uniform(0, 1), it is straightforward to show
that the probability that a random observation from Uniform(0, 1) is
smaller than a random observation from your "population" is equal to
the mean of your "population" (i.e., sample).  To replace "smaller"
with "larger," simply subtract that mean from 1.  It is not necessary
to generate a new sample and use -ranksum-.  Indeed, that approach
introduces additional variability in the result.

David Hoaglin

H.A. David and H.N. Nagaraja (2003). Order Statistics, 3rd ed.
Hoboken, NJ: Wiley.

On Sat, Mar 9, 2013 at 8:49 AM, Tsankova, Teodora <[email protected]> wrote:
> Dear David,
>
> Thank you for the suggestion.
>
> What I mean is that I create a uniform distribution between 0 and 1 with
> 15 observation. Given that every value should have the same probability
> under a uniform distribution I divide 1 by 14 and create those equally
> spaces 15 values. Plotting the CDF of those values would result in a
> straight diagonal line which is ultimately what the ksmirnov test would
> test against as well.
>
> The output from the ksmirnov test is as follows:
>
> ksmirnov mean_random_BTWGr_Fx=uniform()
>
> One-sample Kolmogorov-Smirnov test against theoretical distribution
>            uniform()
>
>  Smaller group       D       P-value  Corrected
>  ----------------------------------------------
>  mean_ra~r_Fx:       0.8221    0.000
>  Cumulative:        -0.8983    0.000
>  Combined K-S:       0.8983    0.000      0.000
>
> So, it seems that although I can reject the inequality of the two
> distributions, I cannot say anything about which one tends to have
> larger values.
>
> In Stata the -porder- option of the ranksum command gives the
> probability that a random draw from the first sample is larger than a
> random draw from the second sample. I like this as it seems very
> intuitive. I use those constructed values to perform this test. My
> results are as follows:
>
> ranksum mean_random_BTWGr_Fx, by( ObservedORUniform) porder
>
> Two-sample Wilcoxon rank-sum (Mann-Whitney) test
>
> ObservedOR~m |      obs    rank sum    expected
> -------------+---------------------------------
>     Observed |       15         259       232.5
>      Uniform |       15         206       232.5
> -------------+---------------------------------
>     combined |       30         465         465
>
> unadjusted variance      581.25
> adjustment for ties        0.00
>                      ----------
> adjusted variance        581.25
>
> Ho: mea~r_Fx(Observ~m==Observed) = mea~r_Fx(Observ~m==Uniform)
>              z =   1.099
>     Prob > |z| =   0.2717
>
> P{mea~r_Fx(Observ~m==Observed) > mea~r_Fx(Observ~m==Uniform)} = 0.618
>
> Those results, although not very strong, seem much easier to interprpet.
>
> Thank you again,
>
> Teodora
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Using Wilcoxon rank-sum (Mann-Whitney) test to compare an emipirical and a uniform distribution
Next by Date: Re: st: summarizing data across rows (e.g as in time use files)
Previous by thread: st: Data Management Issue
Next by thread: st: index number of an observation
Index(es):
- Date
- Thread