Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

I am not sure where the 14 came from. A result from the theory of order statistics may be useful. If we start with a sample of n observations from the uniform distribution on (0, 1), x1, x2, ..., xn, and arrange the observations in nondecreasing order, we obtain the order statistics of the sample, often denoted by x(1) <= x(2) <= ... <= x(n). For the ith order statistic in a sample of n from Uniform(0, 1), the average value is i/(n + 1). See, for example, David and Nagaraja (2003). Thus, you could plot your ordered observations against those values. A reasonable alternative plotting value for x(i) is (i - (1/3))/(n + (1/3)), which (except for i = 1 and i = n) is a close approximation to the median of the sampling distribution of x(i). I don't recall seeing the values of your 15 observations, but the output for -ksmirnov- seems to be saying that your sample contains more small values than one would expect in an sample of 15 from Uniform(0, 1) AND more large values than one would expect in such a sample. If you regard your sample as a population and draw single observations randomly from it and from Uniform(0, 1), it is straightforward to show that the probability that a random observation from Uniform(0, 1) is smaller than a random observation from your "population" is equal to the mean of your "population" (i.e., sample). To replace "smaller" with "larger," simply subtract that mean from 1. It is not necessary to generate a new sample and use -ranksum-. Indeed, that approach introduces additional variability in the result. David Hoaglin H.A. David and H.N. Nagaraja (2003). Order Statistics, 3rd ed. Hoboken, NJ: Wiley. On Sat, Mar 9, 2013 at 8:49 AM, Tsankova, Teodora <TsankovT@ebrd.com> wrote: > Dear David, > > Thank you for the suggestion. > > What I mean is that I create a uniform distribution between 0 and 1 with > 15 observation. Given that every value should have the same probability > under a uniform distribution I divide 1 by 14 and create those equally > spaces 15 values. Plotting the CDF of those values would result in a > straight diagonal line which is ultimately what the ksmirnov test would > test against as well. > > The output from the ksmirnov test is as follows: > > ksmirnov mean_random_BTWGr_Fx=uniform() > > One-sample Kolmogorov-Smirnov test against theoretical distribution > uniform() > > Smaller group D P-value Corrected > ---------------------------------------------- > mean_ra~r_Fx: 0.8221 0.000 > Cumulative: -0.8983 0.000 > Combined K-S: 0.8983 0.000 0.000 > > So, it seems that although I can reject the inequality of the two > distributions, I cannot say anything about which one tends to have > larger values. > > In Stata the -porder- option of the ranksum command gives the > probability that a random draw from the first sample is larger than a > random draw from the second sample. I like this as it seems very > intuitive. I use those constructed values to perform this test. My > results are as follows: > > ranksum mean_random_BTWGr_Fx, by( ObservedORUniform) porder > > Two-sample Wilcoxon rank-sum (Mann-Whitney) test > > ObservedOR~m | obs rank sum expected > -------------+--------------------------------- > Observed | 15 259 232.5 > Uniform | 15 206 232.5 > -------------+--------------------------------- > combined | 30 465 465 > > unadjusted variance 581.25 > adjustment for ties 0.00 > ---------- > adjusted variance 581.25 > > Ho: mea~r_Fx(Observ~m==Observed) = mea~r_Fx(Observ~m==Uniform) > z = 1.099 > Prob > |z| = 0.2717 > > P{mea~r_Fx(Observ~m==Observed) > mea~r_Fx(Observ~m==Uniform)} = 0.618 > > Those results, although not very strong, seem much easier to interprpet. > > Thank you again, > > Teodora * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
- Next by Date:
**Re: st: summarizing data across rows (e.g as in time use files)** - Previous by thread:
**st: Data Management Issue** - Next by thread:
**st: index number of an observation** - Index(es):