Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re:Interesting results from a simulation


From   Steven Samuels <[email protected]>
To   [email protected]
Subject   Re: st: Re:Interesting results from a simulation
Date   Mon, 21 Mar 2011 20:43:51 -0400

There were some typos in my derivation yesterday, corrected below. I've expanded the explanation a bit.

The patterns Victor observed follow from the theory of simple random sampling of n elements with replacement from a population of size N 

p = single draw probability = 1/N   1-p = 1-1/N = (N-1)/N.

pk = P{draw a specified element k times in n draws} = 
comb(n,k) * p^k * (1-p)^(n-k) (binomial probability; comb(n,k) = n!/(k!(n-k)!)

Applying this formula with n = N:

p0= P{draw no times) =
(1-1/n)^n ~ exp(-1) = .3679 when n is large

p1 = P{draw 1 time) = 
comb(n,1)*1/n * ((n-1)/n)^(n-1) = n * (1/n) * ((n-1)/n)^(n-1) = p0* n/(n-1)~ .3679

p2 = P(draw 2 times) =
comb(n,2)*1/n^2 * ((n-1)/n)^(n-2) =  n*(n-1)/2 * (1/n)^2 *((n-1)/n)^(n-2) = p1/2

p3 = P(draw 3 times) = 
comb(n,3)*1/n^3 * ((n-1)/n)^(n-3) = n*(n-1)*(n-2)/6 * 1/n^3 * ((n-1)/n)^(n-3) ~ p2/3.

The patterns hold pretty well for n = N as low as 100, at least for the first few numbers 
p0 = 0.3660
p1 = 0.3697
p2 = 0.1849  p1/p2 = 2.00
p3 = 0.0610  p2/p3 = 3.03
p4 = 0.0149  p3/p4 = 4.09


Steve
[email protected]



On Mar 20, 2011, at 4:18 PM, Victor Zammit wrote:

Dear Statalisters,

I have simulated drawing at random,one observation with replacement,at a time ,for 30,000 times, from a finite population of 30,000 observations.The same process was repeated 200 times.Then I made a count for every observation and discovered that any given observation,after 30000 trials has the probability of realising 0 times(c0) is ~36.756%,the probability of realing 1 time (c1) is ~36.83% ,the probability of realing 2 times (c2) is 1/2 as much,i.e. ~18.4%.That of realising 3 time (c3) is about 1/3 of 18.4 = ~6.11%.and the pattern basically continues,as the summation at the bottom,demonstrates.



c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 n

11018 11054 5549 1807 450 107 13 2 0 0 0 1

11133 10980 5403 1884 471 102 21 6 0 0 0 2

11018 11054 5549 1807 450 107 13 2 0 0 0 3

....................................................................................................................................................................................

10886 11273 5482 1797 453 97 11 0 0 1 0 134

11026 11013 5554 1847 472 78 10 0 0 0 0 135

11019 11030 5574 1805 465 96 10 1 0 0 0 136

......................................................................................................................................................................................

11037 10981 5610 1823 443 86 13 6 1 0 0 199

11011 11032 5583 1827 439 85 21 2 0 0 0 200

PS If anyone is interested I would provide the complete dataset.Please let me know.

. summ

Variable Obs Mean Std. Dev. Min Max


c0 200 11026.87 61.53633 10886 11184

c1 200 11050.32 82.50102 10839 11273

c2 200 5520.295 69.39095 5366 5672

c3 200 1834.235 40.62328 1761 1930

c4 200 457.595 15.43835 415 502


c5 200 91.255 10.32536 72 117

c6 200 16.56 4.544188 7 25

c7 200 2.675 2.039651 0 7

c8 200 .195 .3971949 0 1

c9 200 .01 .0997484 0 1


c10 200 0 0 0 0

Dividing the above variables c0-c10 by 30000 to get the respective probabilities and then summ. results in:

. summ

Variable Obs Mean Std. Dev. Min Max


c0 200 .3675622 .0020512 .3628667 .3728

c1 200 .3683438 .00275 .3613 .3757667

c2 200 .1840098 .002313 .1788667 .1890667

c3 200 .0611412 .0013541 .0587 .0643333

c4 200 .0152532 .0005146 .0138333 .0167333


c5 200 .0030418 .0003442 .0024 .0039

c6 200 .000552 .0001515 .0002333 .0008333

c7 200 .0000892 .000068 0 .0002333

c8 200 6.50e-06 .0000132 0 .0000333

c9 200 3.33e-07 3.32e-06 0 .0000333


c10 200 0 0 0 0

Note:the most number of times that a given observation is drawn in one set of 30000 draws is 9,and in my experiment happened in the 134th loop.

I find,the probability pattern,quite surprising.Can anyone provide any intuition on this ? Why is the probability of an observation not realising ,equal

to that of realising just once,(= ~.368),in the number of trials the size of the population ?

Victor Zammit.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index