# RE: st: RE: Hypergeometric Distribution

 From "Nick Cox" To Subject RE: st: RE: Hypergeometric Distribution Date Thu, 23 Aug 2007 10:25:03 +0100

```Consider

comb(K, k) * comb(N - K, n - k) / comb(N, n)

When I look at that, my main worry would be that
the numerator could get rather large before it
is scaled down by the denominator. Hence I
would try

exp(ln(comb(K, k)) + ln(comb(N - k, n - k)) - ln(comb(N, n)))

as a check. I know that 0s will become missings, but
it's my understanding that in such cases the resulting
probabilities should all be 0 in any case.

It may be that any StataCorp function, say

hyperg(N, n, K, k),

would be just be this underneath. But perhaps not.

Nick
n.j.cox@durham.ac.uk

Marcello Pagano

> The hypergeometric plays a central role in sampling when
> sampling from a
> finite population.  The binomial provides an approximation for large
> samples, but why rely on approximations today when they are not
> necessary? and how good is the approximation, anyway?  Possibly the
> reliance on the approximation provided by the binomial has lulled us
> into a complacency that contributed to the "evidence since 1999"?
>
> I did research a little with -comb( )- and that works pretty
> well, but I
> did a very limited study.  A Stata function with all its usual
> associated robustness and accuracy would be nice, in my opinion.

Nick Cox

> >>>>> Roger's posting includes what I presume is an allusion to
> >>>>> an -egen- function _ghyper.ado that I wrote in 1999.
> >>>>>
> >>>>> I withdrew this program as redundant some years ago,
> >>>>> given that you can use something like
> >>>>>
> >>>>> comb(K, k) * comb(N - K, n - k) / comb(N, n)
> >>>>>
> >>>>> wherever you want. In context N, K, n, k may be
> >>>>> variables, scalars or placeholders for numeric
> >>>>> constants, or any mixture thereof.
> >>>>>
> >>>>> This might need a wrapper to yield zeros where
> >>>>> appropriate, or it might need care whenever
> >>>>> individual terms get very large, but otherwise
> >>>>> does it raise any problems?

Marcello Pagano

> >>>>>> Does anyone have or know of Stata code to calculate the
> >>>> Hypergeometric Distribution accurately?
> >>>>>>
> >>>>>> See Journal of Discrete Algorithms ,  Volume 5 ,  Issue 2
> >>>>>>
> >>>> (June 2007)
> >>>>
> >>>>>> Pages: 341-347 for an article by Berkopec, HyperQuick
> >>>> algorithm for discrete hypergeometric distribution
> >
<http://portal.acm.org/citation.cfm?id=1240586&coll=GUIDE&dl=GUIDE&CFID=

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```