[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Hypergeometric Distribution

From   Marcello Pagano <>
Subject   Re: st: RE: Hypergeometric Distribution
Date   Wed, 22 Aug 2007 20:12:23 -0400

The hypergeometric plays a central role in sampling when sampling from a finite population. The binomial provides an approximation for large samples, but why rely on approximations today when they are not necessary? and how good is the approximation, anyway? Possibly the reliance on the approximation provided by the binomial has lulled us into a complacency that contributed to the "evidence since 1999"?

I did research a little with -comb( )- and that works pretty well, but I did a very limited study. A Stata function with all its usual associated robustness and accuracy would be nice, in my opinion.


Nick Cox wrote:

There are many answers to this, but dinner supervenes.
If you push hard enough, and show why it is needed, either StataCorp or a user will write a program for this.
The evidence since 1999 has been that such a program is not needed.
Marcello Pagano

Why buy Stata if you are expected to do all this for yourself?

Nick Cox wrote:

Apply -ln()-, -exp()- and -cond()- as needed.
Marcello Pagano

Just concerned with the accuracy.

Nick Cox wrote:

Roger's posting includes what I presume is an allusion to an -egen- function _ghyper.ado that I wrote in 1999.
I withdrew this program as redundant some years ago, given that you can use something like
comb(K, k) * comb(N - K, n - k) / comb(N, n)

wherever you want. In context N, K, n, k may be variables, scalars or placeholders for numeric
constants, or any mixture thereof.
This might need a wrapper to yield zeros where appropriate, or it might need care whenever individual terms get very large, but otherwise
does it raise any problems?
Marcello Pagano

I looked at --ssizebi-- but it seems to be focused on power and sample sizes.
Newson, Roger B wrote:

Thanks to Marcello for telling us all about this

algorithm, which looks very useful. A search on

findit hypergeometric

in Stata finds a single reference (to a SSC package), which was
distributed as long ago as 1999. This suggests that the new

might be a good candidate for implementation in Mata by
Marcello, or by

anybody else with the time and inclination to do so.
Marcello Pagano

Does anyone have or know of Stata code to calculate the
Distribution accurately?

See Journal of Discrete Algorithms , Volume 5 , Issue 2
(June 2007)
Pages: 341-347 for an article by Berkopec, HyperQuick
algorithm for
discrete hypergeometric distribution

* For searches and help try:
*   For searches and help try:

© Copyright 1996–2023 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index