Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Hypergeometric Distribution


From   Marcello Pagano <pagano@hsph.harvard.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: Hypergeometric Distribution
Date   Thu, 23 Aug 2007 20:25:54 -0400

Thanks Stas.
m.p.

Stas Kolenikov wrote:
I know that the following was a breakthrough paper speeding up
hypergeometric computations quite a bit:
http://math.mit.edu/~plamen/files/hyper.pdf. Marcello, this guy is in
your geographic area, so you can get him out for a coffee or something
to see if there are any fast algorithms to work out in [St|M]ata.

On 8/23/07, Mike Lacy <Michael.Lacy@colostate.edu> wrote:

>Date: Wed, 22 Aug 2007 15:32:38 +0100
>From: "Newson, Roger B" <r.newson@imperial.ac.uk>
>Subject: st: RE: Hypergeometric Distribution
>
>Thanks to Marcello for telling us all about this recently-published
>algorithm, which looks very useful. A search on
>
>findit hypergeometric
>
>in Stata finds a single reference (to a SSC package), which was
>distributed as long ago as 1999. This suggests that the new algorithm
>might be a good candidate for implementation in Mata by Marcello, or by
>anybody else with the time and inclination to do so.

I have something similar and could use a collaborator:

I have a Stata program to calculate hypergeometric probabilities
using the algorithm of:

Berry, K. J., & Mielke, P. W. (1983). A rapid FORTRAN
subroutine for the Fisher exact probability test. Educational and
Psychological
Measurement,43, 167-171.

Their algorithm exploits a recursion, and so avoids the calculation
of any factorials or log factorials in calculating the
hypergeometric. I suspect it is faster than even the newly published
algorithm, although I don't know. It is particularly suited to
applications in which the entire vector of probabilities across the
range of the variable is needed (e.g., Fisher's Exact), since it has
to calculate all the probabilities to get just one of them. However,
it can do all the probabilities for a variable with what I believe is
lower O() complexity than a conventional algorithm would calculate
any single one.

I am not a "production quality" Stata programmer, and don't want to
take the time to be one, so if anyone else is interested, I'd be
happy to send them my code to be dressed up for public use. I
considered posting the program to the list (only about 40 lines), but
didn't know if that was quite appropriate.


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index