Marcello Pagano

statalist@hsphsun2.harvard.edu |

Re: st: general statistical reasoning question in biomedical statistics(no Stata content)

Thu, 11 Dec 2003

Let me be picky with just one point you make because I see it

repeated too often, and that is the issue that, "And for every 20 baseline variables compared, you'd *expect* about 1 of those baseline variables to have a P of < 0.05"

This is a misquote of a mathematical tautology that says that 5% of

all tests (1 in 20) will fall into the 5% region. The proper quote

is that this refers to *independent* tests. This oversight is

especially important here because if ever one should question the

independence assumption it is in this situation. When we are looking

at a number of baseline characteristrics on the patients, it is

probably more than likely that there is some dependence amongst them.

For example, if the two arms are not balanced with respect to height

with one arm getting the shorter patients, then more than likely

that arm will have the lighter patients too.

So when we do a number of related tests, we may get more or less

than 5% significant due to chance. It all depends on the

structure of the dependency, a point that should be made to students.

m.p.

Christopher W. Ryan wrote:

Having read the Statalist FAQ, and previous correspondence about general statistical questions, I hope no one minds . . . .

Among my teaching duties in my medical school and family practice residency is "critical appraisal of the medical literature." I try to go over principles of good design and valid analysis. A question frequently comes up when we discuss randomized controlled trials. In these articles, there is almost always a "Table 1," that describes the baseline demographic and clinical variables of the two arms (say,

