# st: general statistical reasoning question in biomedical statistics (noStata content)

 From "Christopher W. Ryan" To Statalist Subject st: general statistical reasoning question in biomedical statistics (noStata content) Date Thu, 11 Dec 2003 12:11:36 -0500

Having read the Statalist FAQ, and previous correspondence about general statistical questions, I hope no one minds . . . .

Among my teaching duties in my medical school and family practice residency is "critical appraisal of the medical literature." I try to go over principles of good design and valid analysis. A question frequently comes up when we discuss randomized controlled trials. In these articles, there is almost always a "Table 1," that describes the baseline demographic and clinical variables of the two arms (say, placebo and active drug, for example.) There are usually *a lot* of baseline measurements. Each one is usually listed with a "P value," indicating whether the placebo and active drug subjects differed on that measurement.

Then the manuscript goes on to describe the rest of the study, and the results . . .

If the results show an advantage for the active drug, readers (including my students and residents) will often go back to "Table 1" and say, "Oh but look, the samples were not identical. Blah-blah was significantly higher in the placebo arm to begin with. Therefore I can't accept these results as valid."

I've never agreed with that. So I want to outline my chain of reasoning here and see if I've got it straight.

There are two premises in a randomized controlled trial with two arms:

1. The two samples are drawn randomly from the same population
2. The active drug actually has no effect (the null hypothesis)

And then there are the results (R).

If 1 and 2 are both true, we can look at R and calculate how likely we were to see results that "extreme" or more so. That's the P value. If P < the conventional 0.05, we say, "Gee, if 1 and 2 are both true, we *might* have seen results R, but only 5% of the time or less, and that's pretty unlikely. But we *did* see R. Therefore either 1 or 2 must be untrue. And I'm confident my randomization was solid. Therefore 2 must be untrue, and the drug really does have an effect."

There is nothing this chain of reasoning that requires the samples to be indentical/indistinguishable. And for every 20 baseline variables compared, you'd *expect* about 1 of those baseline variables to have a P of < 0.05 The statistical techniques have "built-in" accomodation for this. This does not invalidate the conclusions.

It is a difficult concept for my learners to grasp. Or maybe I've got it wrong?

Thanks.

--Chris Ryan

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/