# RE: implementation of boschloo's test: very slow execution

 From "Nick Cox" To Subject RE: implementation of boschloo's test: very slow execution Date Fri, 22 Feb 2008 18:18:52 -0000

```I don't see that you need hold constants in variables. Use scalars not
variables to hold constants. This refers to -current- and -PHOsum-.

I have only scanned this quickly, but at first sight it appears that
your variable -p- is never used elsewhere in the code.

Converting some of this to Mata is perhaps the most obvious speed-up.

Nick
n.j.cox@durham.ac.uk

Eva Poen

I am currently trying to implement the unconditional test for
comparing two binomial proportions by Boschloo. The test is described
e.g. here: http://www.west.asu.edu/rlberge1/papers/comp2x2.pdf (page
3).

The problem with my implementation is that it takes a long time to
execute: for n1=148 and n2=132, it takes roughly 30 minutes on a
reasonably modern machine. I am using Stata 9.2 for this. I was
wondering if anyone could suggest improvements to make it faster.

Here is how the test works:
Say you have two samples of size n1 and n2. In sample 1, there are x1
successes whereas in sample 2 there are x2.
a) compute fisher's exact test for the two proportions and record the
two-sided p-value.

b) for every possible combination of successes in the two samples
(which is n1Xn2 combinations):
b1) compute fisher's exact test
b2) check whether the p-value is less than or equal to the p-value
from the original sample  in step a).
b3) if it is <= the result from a), compute the product of the
binomial probability mass functions for this combination. Do this for
(a lot) of values for 0 <= p <= 1. If it is > the result from step a)
, do nothing.

c) The result of step b3) is added up over all n1Xn2 iterations.

d) Search for the maximum of c) over all values of p. This maximum is
the p-value for this test.

Here is what I do in every step.
I first create a variable that holds all probability values of p that
I want to do the calculations for:
set obs 10001
qui gen double p = (_n-1)/10001
The higher the "resolution" of p, the more accurate is the result going
to be.

I also create a variable that holds the current calculations, and one
that holds the (running) sum over all iterations:
qui gen double PH0sum = 0
qui gen double current = .

a)
qui tabi `=n1-x1' `=n2-x2' \ `=x1' `=x2', exact
scalar fisher = r(p_exact)

b) and c)
forvalues xx1 = 0/`=n1' {
forvalues xx2 = 0/`=n2' {
qui tabi `=n1-`xx1'' `=n2-`xx2'' \ `xx1' `xx2', exact
scalar fishnew = r(p_exact)
if fishnew <= fisher {
qui replace current =
Binomial(n1,`xx1',theta)-Binomial(n1,`=`xx1'+1',theta))*(Binomial(n2,`xx
2',theta)-Binomial(n2,`=`xx2'+1',theta))
qui replace PH0sum = PH0sum + current
}
}
}

d)
qui sum PH0sum
scalar p_boschloo = r(max)

********************************************************

I can replicate an example I found in a Biometrics(2003) paper by
Mehrotra et al., so I believe that the calculations are correct. Is
there a way to speed things up inside the nested loop? I'd be greatful
for any hints, as I have to run quite a lot of these tests; time does
matter here.

Best regards,
Eva
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```