Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Expert Opinion - An alternative to rndbin


From   n j cox <n.j.cox@durham.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Expert Opinion - An alternative to rndbin
Date   Mon, 24 Sep 2007 19:23:53 +0100

-rndbin-
========

I posted at length on -rndbin- a few days ago in a post which
Tiago acknowledged. However, it seems that I didn't make myself
very clear. The only reason I can think of for using -rndbin- is that you are not aware of an alternative. (That could be decisive; many users would understandably grudge the time taken to work out an alternative, especially if they are not fluent in Stata. Life is short, and so forth.)

But I gave direct alternatives in my posting. By comparison,
-rndbin- is slow, it is incompletely documented, and it can bite you
because of the way that it messes with -set-. It also requires you
to -rename- its result whenever you only want more than one
binomial random variable in memory.

Bernoulli
=========

In Tiago's example here the binomial collapses to the Bernoulli
and simulation similarly collapses to

. gen xb = uniform() < 0.25

To repeat the moral of last week's posting, no program is
necessary when you can do it directly. (And in this case, I see no
gain in using Mata.) The number of events (whatever is coded 1) is
then quickly accessible as r(sum) after

. su xb, meanonly

or using -count-, as in Tiago's post.

Coding directly is always going to be faster than -rndbin-.
My experiments indicate that it is about 5 times faster than
-rndbin-. That could matter to you but it is far less of a
problem than Tiago implies. If Tiago is experiencing a difference
between seconds and hours, then something else is biting, perhaps
a memory problem, but not the speed of -rndbin-. -rndbin-
really isn't that slow, just slower than it need be.

Normal approximation
====================

Tiago quotes the normal approximation to the binomial. I am not
clear how this is relevant. If you know, or assume, something is
binomial, then its mean, SD and other properties follow directly
and there is no need for simulation. If you want the random
numbers (0s and 1s) directly, then as the Bernoulli is (very!)
discrete, random normal deviates would seem inappropriate.

Normal randoms are in any case no faster to get than Bernoulli
randoms, so saving time is not an issue.

Nick
n.j.cox@durham.ac.uk

Tiago Pereira

I would like to know your expert opinion about a faster approach to
generate binomial random numbers using Stata. I wish to know if the
approach below is valid and if it makes some sense.

Well, for example, assume we have _N=1. Then, we have 25000 observations
and p= 0.25, which is the probability of the event. -rndbin- is quite
straighforward:

rndbin 25000 0.25 1
qui count if xb==1

However, I have to run -rndbin- millions of times, say, 10^15 times, count
the number of events (xb=1) and then summarize it to get a new variable.
That approach takes a lot of time and even using Mata functions this is
time-consuming.

Taking statistical aspects of the binomial distribution into account, may
I approximate that calculation using the following approach?

p = 0.25
observations = 25000
sd_of_p_hat = standard deviation of the p_hat

gene sd_of_p_hat= sqrt(((p)*(1-p))/(observations))
generate z = invnorm(uniform())
replace p = (z)*(sd_of_p_hat)+(p)
gene number_of_events= round(p*observations)

The latter approach is really faster (2-3 seconds for 100000 studies,
whereas -rndbin- are likely to take some hours, at least in my PC) and it
is likely to be unbiased for pīs between 0.3 and 0.7, the range I have to
work with in Human Genetics.

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index