[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Marcello Pagano <pagano@hsph.harvard.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: Hypergeometric Distribution |

Date |
Thu, 23 Aug 2007 20:25:54 -0400 |

Thanks Stas. m.p. Stas Kolenikov wrote:

I know that the following was a breakthrough paper speeding up

hypergeometric computations quite a bit:

http://math.mit.edu/~plamen/files/hyper.pdf. Marcello, this guy is in

your geographic area, so you can get him out for a coffee or something

to see if there are any fast algorithms to work out in [St|M]ata.

On 8/23/07, Mike Lacy <Michael.Lacy@colostate.edu> wrote:

>Date: Wed, 22 Aug 2007 15:32:38 +0100

>From: "Newson, Roger B" <r.newson@imperial.ac.uk>

>Subject: st: RE: Hypergeometric Distribution

>

>Thanks to Marcello for telling us all about this recently-published

>algorithm, which looks very useful. A search on

>

>findit hypergeometric

>

>in Stata finds a single reference (to a SSC package), which was

>distributed as long ago as 1999. This suggests that the new algorithm

>might be a good candidate for implementation in Mata by Marcello, or by

>anybody else with the time and inclination to do so.

I have something similar and could use a collaborator:

I have a Stata program to calculate hypergeometric probabilities

using the algorithm of:

Berry, K. J., & Mielke, P. W. (1983). A rapid FORTRAN

subroutine for the Fisher exact probability test. Educational and

Psychological

Measurement,43, 167-171.

Their algorithm exploits a recursion, and so avoids the calculation

of any factorials or log factorials in calculating the

hypergeometric. I suspect it is faster than even the newly published

algorithm, although I don't know. It is particularly suited to

applications in which the entire vector of probabilities across the

range of the variable is needed (e.g., Fisher's Exact), since it has

to calculate all the probabilities to get just one of them. However,

it can do all the probabilities for a variable with what I believe is

lower O() complexity than a conventional algorithm would calculate

any single one.

I am not a "production quality" Stata programmer, and don't want to

take the time to be one, so if anyone else is interested, I'd be

happy to send them my code to be dressed up for public use. I

considered posting the program to the list (only about 40 lines), but

didn't know if that was quite appropriate.

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: RE: Hypergeometric Distribution***From:*Mike Lacy <Michael.Lacy@colostate.edu>

**Re: st: RE: Hypergeometric Distribution***From:*"Stas Kolenikov" <skolenik@gmail.com>

- Prev by Date:
**Re: st: RE: Hypergeometric Distribution** - Next by Date:
**st: Substituting submatrix** - Previous by thread:
**Re: st: RE: Hypergeometric Distribution** - Next by thread:
**st: Panel test for cointegration** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |