[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: implementation of boschloo's test: very slow execution |

Date |
Fri, 22 Feb 2008 18:40:25 -0000 |

I see. My guess is that the multiplication itself is trivial. You might take a pencil and paper to the combinatorics Binomial(A)- Binomial(B) * Binomial(C) -Binomial(D) and see if it boils down to something much simpler. On the other hand, the real problem is possibly just that you are using an interpreted language to do quite a lot of work. Nick n.j.cox@durham.ac.uk Eva Poen Upon reading my post again, I realised that I was not careful enough when simplifying the code for the purpose of sending it to the list. The line involving the binomial: qui replace current = Binomial(n1,`xx1',theta)-Binomial(n1,`=`xx1'+1',theta))*(Binomial(n2,`xx 2',theta)-Binomial(n2,`=`xx2'+1',theta)) should have -theta- replaced by -p-. Therefore, -current- and -PH0sum- are not constant; the binomial product is calculated for all values of -p-, which are 10001 in my case. The p-value of the test is the maximum (well, supremum, actually) of the variable -PH0sum-. It seems to be the case that the most time-consuming thing inside the nested loop is this product of binomial probability mass functions. Is there a way outside Mata to speed this bit up? 2008/2/22, Nick Cox <n.j.cox@durham.ac.uk>: > I don't see that you need hold constants in variables. Use scalars not > variables to hold constants. This refers to -current- and -PHOsum-. > > I have only scanned this quickly, but at first sight it appears that > your variable -p- is never used elsewhere in the code. > > Converting some of this to Mata is perhaps the most obvious speed-up. Eva Poen > I am currently trying to implement the unconditional test for > comparing two binomial proportions by Boschloo. The test is described > e.g. here: http://www.west.asu.edu/rlberge1/papers/comp2x2.pdf (page > 3). > > The problem with my implementation is that it takes a long time to > execute: for n1=148 and n2=132, it takes roughly 30 minutes on a > reasonably modern machine. I am using Stata 9.2 for this. I was > wondering if anyone could suggest improvements to make it faster. > > Here is how the test works: > Say you have two samples of size n1 and n2. In sample 1, there are x1 > successes whereas in sample 2 there are x2. > a) compute fisher's exact test for the two proportions and record the > two-sided p-value. > > b) for every possible combination of successes in the two samples > (which is n1Xn2 combinations): > b1) compute fisher's exact test > b2) check whether the p-value is less than or equal to the p-value > from the original sample in step a). > b3) if it is <= the result from a), compute the product of the > binomial probability mass functions for this combination. Do this for > (a lot) of values for 0 <= p <= 1. If it is > the result from step a) > , do nothing. > > c) The result of step b3) is added up over all n1Xn2 iterations. > > d) Search for the maximum of c) over all values of p. This maximum is > the p-value for this test. > > Here is what I do in every step. > I first create a variable that holds all probability values of p that > I want to do the calculations for: > set obs 10001 > qui gen double p = (_n-1)/10001 > The higher the "resolution" of p, the more accurate is the result going > to be. > > I also create a variable that holds the current calculations, and one > that holds the (running) sum over all iterations: > qui gen double PH0sum = 0 > qui gen double current = . > > a) > qui tabi `=n1-x1' `=n2-x2' \ `=x1' `=x2', exact > scalar fisher = r(p_exact) > > b) and c) > forvalues xx1 = 0/`=n1' { > forvalues xx2 = 0/`=n2' { > qui tabi `=n1-`xx1'' `=n2-`xx2'' \ `xx1' `xx2', exact > scalar fishnew = r(p_exact) > if fishnew <= fisher { > qui replace current = > Binomial(n1,`xx1',theta)-Binomial(n1,`=`xx1'+1',theta))*(Binomial(n2,`xx > 2',theta)-Binomial(n2,`=`xx2'+1',theta)) > qui replace PH0sum = PH0sum + current > } > } > } > > d) > qui sum PH0sum > scalar p_boschloo = r(max) > > ******************************************************** > > I can replicate an example I found in a Biometrics(2003) paper by > Mehrotra et al., so I believe that the calculations are correct. Is > there a way to speed things up inside the nested loop? I'd be greatful > for any hints, as I have to run quite a lot of these tests; time does > matter here. > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: implementation of boschloo's test: very slow execution***From:*"Eva Poen" <eva.poen@gmail.com>

**References**:**implementation of boschloo's test: very slow execution***From:*"Eva Poen" <eva.poen@gmail.com>

**RE: implementation of boschloo's test: very slow execution***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**Re: implementation of boschloo's test: very slow execution***From:*"Eva Poen" <eva.poen@gmail.com>

- Prev by Date:
**Re: implementation of boschloo's test: very slow execution** - Next by Date:
**Re: st: RE: Experienced user of GLLAMM in Minneapolis/St.Paul areawanted to give a workshop. Anyone?** - Previous by thread:
**Re: implementation of boschloo's test: very slow execution** - Next by thread:
**Re: implementation of boschloo's test: very slow execution** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |