Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: ANCOVA for pre post designs

From   Constantine Daskalakis <>
Subject   Re: st: ANCOVA for pre post designs
Date   Tue, 23 Dec 2003 19:10:24 -0500

At 06:12 PM 12/23/2003, David Airey wrote:
This is a question for the biostatisticians on the list.

I'm thinking of formulating a commentary on accepted research procedures in my area that I think could be improved by observing basic statistical arguments presented to researchers by biostatisticians.

It has been suggested that in a randomized clinical trial design with baseline (B) and followup (F) test measures comparing a control and treatment group (G), performing an ANOVA on the ratio pre/post is the worst choice of the 4 ways to deal with baseline differences:

(1) post: analyze F by G
(2) difference: analyze F-B by G
(3) ratio: analyze F/B by G
(4) ancova: analyze F = constant + b1*B + b2*G, for G differences

In light of biostatisticians' suggestion (e.g., Vickers, BMC Medical Research Methodology (2001) 1:6, that method (4) above is preferred most and method (3) is least preferred, does it apply to "prepulse inhibition" literature?
In large trials, (1) should be fine (at least, in terms of no bias). But (2) or (4) may be more efficient.

(3) above is similar in flavor to (2) if you view it on the log scale, i.e.,

(logF-logB) by G (or, equivalently, log(F/B) by G).

A technical question is whether the original measurements (B and F), or their difference on the original scale, or their log-ratio (ie, difference of logs) more closely conforms to the assumptions of linear regression (normality of residuals, homoskedasticity).

Still, I wouldn't do it on (F/B) but rather on log(F/B) if that looks good.

There is a difference in the underlying scientific model and interpretation, of course.

Does the treatment work additively (ie, adds a fixed amount, no matter where you start)? If so, the difference (F-B) would be a good choice (constant additive treatment effect across all values of B). And you'll be talking about the (arithmetic) mean difference for treatment vs. control.

But if the treatment works multiplicatively (ie, increases/decreases your original B measurement by a certain percent), then log(F-B) would be better. And then, by exponentiating the regression coefficients etc, you'll be talking about geometric mean ratio for treatment vs. control.

Finally, the choice between (2) and (4) depends on the correlation between baseline and follow-up measurements. I think that when corr(B,F) < 0.5, then (4) turns out to be more efficient; otherwise, (2) is better. I believe there's a paper by Liang & Zeger on this.

The documents accompanying this transmission may contain confidential health or business information. This information is intended for the use of the individual or entity named above. If you have received this information in error, please notify the sender immediately and arrange for the return or destruction of these documents.

Constantine Daskalakis, ScD
Assistant Professor,
Biostatistics Section, Thomas Jefferson University,
211 S. 9th St. #602, Philadelphia, PA 19107
Tel: 215-955-5695
Fax: 215-503-3804
* For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index