st: Can I repeatedly sample with constraints from an unbalanced data set to balance it?

 From "Paul Walsh" To statalist@hsphsun2.harvard.edu Subject st: Can I repeatedly sample with constraints from an unbalanced data set to balance it? Date Sat, 27 Oct 2007 08:47:29 -0700

```I have a 700 subject data set of a clinical trial comparing two
treatments for outcome (hospital admission from an emergency room) for
a particular disease.  I am also using a three point ordinal scale of
disease severity that strongly predicts hospital admission regardless
of treatment.  Though the trial was designed to balance the two
treatment arms, calculating disease severity is too cumbersome to have
included it to balance each treatment arm with equal numbers of each
severity category in the emergency room.  Thus there is an unequal
distribution of severity of cases in the two arms.  When I calculate
the unadjusted risk ratio of admission for two treatments I obtain a
low, non-significant crude RR, similar to already published studies
that did not account for severity.  When I model the treatments and
include the severity score, the adjusted RR increases and is
significant, demonstrating superiority of one treatment over the
other.

The manuscript reviewers feel that the study should have balanced the
severity scores in both treatment arms instead of including severity
as a variable.  I'd like to run jackknife  or bootstrap estimations of
unadjusted RR by constraining each jackknife/bootstrap to select equal
numbers of patients receiving each treatment with each severity score.
The goal is to repeatedly select samples from the data set that
produce equal numbers of patients in each of the six groups (two
treatments, three severity classifications). Can someone comment on
the feasibility of doing this in the bootstrap/jack knife context?
Since this is not random sampling from the data set, how would this
procedure affect the interpretation of bootstrapped/jacknifed results?
If feasible and interpretable, can someone suggest some code  that
would do this?

I have a 700 subject data set of a clinical trial comparing two
treatments for outcome (hospital admission from an emergency room) for
a particular disease. I am also using a three point ordinal scale of
disease severity that strongly predicts hospital admission regardless
of treatment. Though the trial was designed to balance the two
treatment arms, calculating disease severity is too cumbersome to have
included it to balance each treatment arm with equal numbers of each
severity category in the emergency room. Thus there is an unequal
distribution of severity of cases in the two arms. When I calculate
the unadjusted risk ratio of admission for two treatments I obtain a
low, non-significant crude RR, similar to already published studies
that did not account for severity. When I model the treatments and
include the severity score, the adjusted RR increases and is
significant, demonstrating superiority of one treatment over the
other.

The manuscript reviewers feel that the study should have balanced the
severity scores in both treatment arms instead of including severity
as a variable. I'd like to run jackknife or bootstrap estimations of
unadjusted RR by constraining each jackknife/bootstrap to select equal
numbers of patients receiving each treatment with each severity score.
The goal is to repeatedly select samples from the data set that
produce equal numbers of patients in each of the six groups (two
treatments, three severity classifications). Can someone comment on
the feasibility of doing this in the bootstrap/jack knife context?
Since this is not random sampling from the data set, how would this
procedure affect the interpretation of bootstrapped/jacknifed results?
If feasible and interpretable, can someone suggest some code that
would do this or suggest another way of achieving the same goals?

Paul Walsh

Bakersfield CA
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```