Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: implementation of boschloo's test: very slow execution


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: implementation of boschloo's test: very slow execution
Date   Fri, 22 Feb 2008 18:40:25 -0000

I see. 

My guess is that the multiplication itself is trivial. You might take a
pencil and paper to the combinatorics 

Binomial(A)- Binomial(B) * Binomial(C) -Binomial(D) 

and see if it boils down to something much simpler. 

On the other hand, the real problem is possibly just that you are using
an interpreted language to do quite a lot of work. 

Nick 
n.j.cox@durham.ac.uk 

Eva Poen

Upon reading my post again, I realised that I was not careful enough
when simplifying the code for the purpose of sending it to the list.

The line involving the binomial:
qui replace current  =
Binomial(n1,`xx1',theta)-Binomial(n1,`=`xx1'+1',theta))*(Binomial(n2,`xx
2',theta)-Binomial(n2,`=`xx2'+1',theta))

should have -theta- replaced by -p-. Therefore, -current- and -PH0sum-
are not constant; the binomial product is calculated for all values of
-p-, which are 10001 in my case.

The p-value of the test is the maximum (well, supremum, actually) of
the variable -PH0sum-.

It seems to be the case that the most time-consuming thing inside the
nested loop is this product of binomial probability mass functions. Is
there a way outside Mata to speed this bit up?

2008/2/22, Nick Cox <n.j.cox@durham.ac.uk>:

> I don't see that you need hold constants in variables. Use scalars not
>  variables to hold constants. This refers to -current- and -PHOsum-.
>
>  I have only scanned this quickly, but at first sight it appears that
>  your variable -p- is never used elsewhere in the code.
>
>  Converting some of this to Mata is perhaps the most obvious speed-up.

Eva Poen

>  I am currently trying to implement the unconditional test for
>  comparing two binomial proportions by Boschloo. The test is described
>  e.g. here: http://www.west.asu.edu/rlberge1/papers/comp2x2.pdf (page
>  3).
>
>  The problem with my implementation is that it takes a long time to
>  execute: for n1=148 and n2=132, it takes roughly 30 minutes on a
>  reasonably modern machine. I am using Stata 9.2 for this. I was
>  wondering if anyone could suggest improvements to make it faster.
>
>  Here is how the test works:
>  Say you have two samples of size n1 and n2. In sample 1, there are x1
>  successes whereas in sample 2 there are x2.
>  a) compute fisher's exact test for the two proportions and record the
>  two-sided p-value.
>
>  b) for every possible combination of successes in the two samples
>  (which is n1Xn2 combinations):
>    b1) compute fisher's exact test
>    b2) check whether the p-value is less than or equal to the p-value
>  from the original sample  in step a).
>    b3) if it is <= the result from a), compute the product of the
>  binomial probability mass functions for this combination. Do this for
>  (a lot) of values for 0 <= p <= 1. If it is > the result from step a)
>  , do nothing.
>
>  c) The result of step b3) is added up over all n1Xn2 iterations.
>
>  d) Search for the maximum of c) over all values of p. This maximum is
>  the p-value for this test.
>
>  Here is what I do in every step.
>  I first create a variable that holds all probability values of p that
>  I want to do the calculations for:
>     set obs 10001
>     qui gen double p = (_n-1)/10001
>  The higher the "resolution" of p, the more accurate is the result
going
>  to be.
>
>  I also create a variable that holds the current calculations, and one
>  that holds the (running) sum over all iterations:
>     qui gen double PH0sum = 0
>     qui gen double current = .
>
>  a)
>     qui tabi `=n1-x1' `=n2-x2' \ `=x1' `=x2', exact
>     scalar fisher = r(p_exact)
>
>  b) and c)
>     forvalues xx1 = 0/`=n1' {
>         forvalues xx2 = 0/`=n2' {
>             qui tabi `=n1-`xx1'' `=n2-`xx2'' \ `xx1' `xx2', exact
>             scalar fishnew = r(p_exact)
>             if fishnew <= fisher {
>                 qui replace current =
>
Binomial(n1,`xx1',theta)-Binomial(n1,`=`xx1'+1',theta))*(Binomial(n2,`xx
>  2',theta)-Binomial(n2,`=`xx2'+1',theta))
>                 qui replace PH0sum = PH0sum + current
>             }
>         }
>     }
>
>  d)
>  qui sum PH0sum
>  scalar p_boschloo = r(max)
>
>  ********************************************************
>
>  I can replicate an example I found in a Biometrics(2003) paper by
>  Mehrotra et al., so I believe that the calculations are correct. Is
>  there a way to speed things up inside the nested loop? I'd be
greatful
>  for any hints, as I have to run quite a lot of these tests; time does
>  matter here.
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index