Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: sample size calculation for binary repeated measures


From   Laura Gibbons <gibbonsl@u.washington.edu>
To   Statalist <statalist@hsphsun2.harvard.edu>
Subject   Re: st: sample size calculation for binary repeated measures
Date   Mon, 17 Sep 2007 10:40:16 -0700 (PDT)

Thank you, Joseph, this code was very timely as I am just setting out to do something similar today.

I had to also install "ridder" to get this to work, in addition to ovbd. It's called by ovbd.

Many thanks,

Laura

On Sun, 16 Sep 2007, Joseph Coveney wrote:


Louis Apicella wrote:

I am trying to determine the sample size for a binary repeated measures
study in which we'll collect baseline data from two groups, intervention and
control, then expose the intervention group, followed by collecting data
twice more post-intervention from both groups.

I've been using this code to come up with sample sizes

sampsi .45 .55, sd1(.5) sd2(.5) power(.8) method(ancova) pre(1) post(2)
r1(.5) r01(.5)

My question is that I have only proportions to work with and I'm not sure
that sampsi is designed to use proportions when we specify sd1() and sd2().
I also feel that the variance should be closer to .1 for each sample but
then my sample size is ridiculously small (n=8).  Essentially, what I want
to be able to do is a comparison of the change in the intervention group to
the change in the control group. Any ideas?

--------------------------------------------------------------------------------

From the dialogue screens, it seems that -sampsi- only works with continuous
data for repeated measurements.  The -sd1() sd2()- options wouldn't be
invoked when using -sampsi- with proportions.

There might very well be an analytical method in the literature for power
analysis for your situation; Roger Newson has already mentioned one
approach.  But have you considered using simulation as another?  See
www.stata.com/support/faqs/stat/power.html and
www.stata-press.com/journals/sjabstracts/st0010.pdf for some contributions
by Alan Feiveson on the topic.

If an analytical method that fits the bill is readily available and facile,
then Monte Carlo simulation won't be attractive, but the latter is always
doable and often competitive with a trip to the library (and perhaps even
with JSTOR) with respect to overall time spent.  In your case, you can
assess sensitivity toward assumptions regarding correlation, check out link
functions, and explore population-average verus subject-specific modeling
approaches, as well.  See the do-file below for an example of link function
exploration using GEE with your ANCOVA-like set up and assumptions for
proportions and correlations; it displays power ("Mean" in -summarize-'s
output) for the canonical link function ("A") and identity link function
("B") as it climbs through a series of three candidate treatment group
sample sizes under the alternative hypothesis, and confirms the Type I error
rate (displaying it in the same way as it does power) using the smallest of
the probe per-group sample sizes under the null hypothesis.  (The
illustration do-file requires a user-written command, -ovbd-, that you can
download from the SSC archive.)

With a subject-specific model (-xtlogit , re- or -gllamm-), the simulation
approach will demand increasing patience with increasing integration points.
(It's not included below because the program as originally written
used -xtlogit , re-, but you can often take advantage of -contract- and
[fweight =] to speed things up.)  Although I took a chance in the do-file
below (especially with Model B), putting a limit on the number of iterations
is a good idea when using an iterative analytical method in the simulation
program.

Note that the illustration do-file and output shown below shouldn't be
construed as advocating a risk difference (identity link) model, but if the
observed proportions are all in the neighborhood of 50% as you expect, then
you might be lucky enough to get away with it.

Joseph Coveney

clear *
set more off
set seed `=date("2007-09-16", "YMD")'
*
capture program drop binsimem
program define binsimem, rclass
  version 10
  syntax , n(integer) [Corr(real 0.5) NUll]
  tempname Control Experimental Correlation
  tempfile tmpfil0
  if ("`null'" == "") {
      matrix define `Control' = (0.5, 0.55, 0.55)
      matrix define `Experimental' = (0.5, 0.45, 0.45)
  }
  else {
      matrix define `Control' = (0.5, 0.5, 0.5)
      matrix define `Experimental' = (0.5, 0.5, 0.5)
  }
  matrix define `Correlation' = J(3, 3, `corr') + ///
    I(3) * (1 - `corr')
  drop _all
  ovbd bas res0 res1, means(`Control') ///
    corr(`Correlation') n(`n') clear
  generate byte trt = 0
  save `tmpfil0'
  drop _all
  ovbd bas res0 res1, means(`Experimental') ///
    corr(`Correlation') n(`n') clear
  generate byte trt = 1
  append using `tmpfil0'
  erase `tmpfil0'
  generate int pid = _n
  reshape long res, i(pid) j(tim)
  generate byte bia = trt * bas
  generate byte tia = trt * tim
  xtgee res bas trt tim bia tia, i(pid) t(tim) ///
    family(binomial) link(logit) corr(exchangeable)
  test trt bia tia
  return scalar A_p = r(p)
/*    xtlogit res bas trt tim bia tia, i(pid) re ///
    intmethod(mvaghermite) intpoints(30) */
  xtgee res bas trt tim bia tia, i(pid) t(tim) ///
    family(binomial) link(identity) corr(exchangeable)
  test trt bia tia
  return scalar B_p = r(p)
end
*
forvalues n = 250(50)350 {
  quietly simulate A_p = r(A_p) ///
    B_p = r(B_p), reps(1000): binsimem , n(`n')
  display in smcl as text _newline(1) "n = " as result %3.0f `n'
  capture noisily assert !missing(A_p) & !missing(B_p)
  generate byte A_pos = (A_p < 0.05)
  generate byte B_pos = (B_p < 0.05)
  summarize *_pos
}
local n 250
quietly simulate A_p = r(A_p) B_p = r(B_p), ///
reps(1000): binsimem , n(`n') null
display in smcl as text _newline(1) "NULL n = " as result %3.0f `n'
capture noisily assert !missing(A_p) & !missing(B_p)
generate byte A_pos = (A_p < 0.05)
generate byte B_pos = (B_p < 0.05)
summarize *_pos
exit

Edited results:

n = 250

  Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
     A_pos |      1000        .754     .430894          0          1
     B_pos |      1000        .768    .4223202          0          1

n = 300

  Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
     A_pos |      1000        .825    .3801572          0          1
     B_pos |      1000        .833    .3731625          0          1

n = 350

  Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
     A_pos |      1000        .877    .3286016          0          1
     B_pos |      1000        .884    .3203852          0          1

[Snip]

NULL n = 250

[Snip]

  Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
     A_pos |      1000        .046    .2095899          0          1
     B_pos |      1000        .051    .2201078          0          1

. exit

end of do-file


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Laura E. Gibbons, PhD
General Internal Medicine, University of Washington
Box 359780
Harborview Medical Center
325 Ninth Avenue, Seattle, WA  98104

phone: 206-744-1842   fax: 206-744-9917
Office address: 401 Broadway, Suite 5122
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index