[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Binomial regression

From   Marcello Pagano <>
Subject   Re: st: Binomial regression
Date   Fri, 03 Aug 2007 13:40:57 -0400

Everybody keeps asking why I am not on vacation. Maybe I should listen!

I agree wholeheartedly that the risk difference is sometimes preferable to the odds ratio. Witness what is currently going on with the attack on Avandia. Rather than report a risk difference of 0.2% in the MI rate, we are faced with a risk INCREASE of 40% -- the effect of going from 0.5% to 0.7%. If reported as a risk difference it would probably not have made the headlines it has nor created the furor it has. [Before I get attacked for being too esoteric, for those of you who are interested in seeing a very low grade meta analysis, take a look at the New England Journal of Medicine in May for the article by Steven Nissen. So far the article has had the effect of a 30% decline on the sales of the drug Avandia.] Plus, to get to do the analysis in the odds-ratio domain, rather than in the risk difference domain, they had to get rid of about a third of the studies that showed no difference; i.e. studies that favoured the null hypothesis of no difference. There are other major problems with this analysis, too, but all the studies in the meta analysis were RCTs and thus there is no need to look at odds ratios.

Anyway, if you have the risks you can, of course, calculate the differences of the risks. The problem, as you point out, is how this risk difference is related to other covariates. This, I still maintain, is dependent on the situation at hand. And however convenient it would be to express this as a linear function of other covariates, it may neither fit the data, nor fit into the rather important constraint that probabilities should be between zero and one. Those are the only points I am making---life may be a little more complex and may not allow a constant risk difference in certain situations.

Finally, yes the cdf of the uniform distribution, as opposed to the logistic distribution, say, is what gives you a straight line transformation between the limits of the uniform. So, yes, a uniform is involved.


Constantine Daskalakis wrote:

Hey Marcello.

Shouldn't you be on vacation? :)

Certainly, you can recover the absolute risks for each covariate pattern
(or observation), but what about the risk DIFFERENCE?

The point is this:

If the Xs are additive on the log-odds scale, we can fit a logistic
regression and report a single OR for each X as a summary measure.
Additivity on the log-odds scale implies non-additivity on the
probability scale, so there is no single RD for X (depends on the actual value of X).

But suppose the Xs are (more) additive on the original probability scale
rather than the log-odds scale. Then, it would be best to report a
single RD for each X as a summary measure of effect (rather than a
single OR, which is not really appropriate in this situation). Now, from
a logistic regression without interactions, we get a single OR for each
X, but an infinite number of RDs for the same X. That's not very good.

On the other hand, we can retrieve a single summary RD from a
main-effects-only binomial regression model. And, by the way, a logistic
regression would need a bunch of interactions to have comparable fit to
this binomial regression, and would still not provide us with a single
summary RD.

There's also causal inference stuff regarding causal interpretations for
RD, but not for RR or OR. So, that's additional motivation for focusing on the RD.

Bottom line: The goal here (get summary RD) CANNOT be achieved via
logistic regression.

Finally, there's nothing about a "uniform distribution" here. It's just
a generalized linear model with the identity link -- a different model
for how the risk changes as a function of covariates (and the same
binomial error structure that logistic regression uses). There's no
substantive reason to prefer one link over another. Why would the risk
follow the logistic function rather than any other curve (including a
line)? The correct/appropriate link function will depend on the data at
hand. Sometimes it will be a line, sometimes a logistic, sometimes some
other unknown beast.

I think that we are getting to the point where logistic has become so
ingrained that many people think of it (unconsciously?) as "have
screwdriver, will always use screwdriver (with binary outcome)." The
logit is the canonical link function for the Bernoulli/binomial and
implementation of regression with the logit link is easier than anything
else. But that's just ease of programming and custom.

For history buffs, this goes back to Cornfield (Bull Int Statist Inst, 1961) who 'tricked' programs designed for discriminant analysis to do logistic regression and to Walker & Duncan (Biometrics, 1967) who looked at the maximum likelihood approach. At the time, lack of computing resources made such things impractical for other types of regressions for binary outcomes. But that's half a century ago.


On 8/2/2007 10:45 PM, Marcello Pagano wrote:

Sorry to disagree with your first sentence, Constantine.
Logistic regression stipulates a linear relationship of covariates with the log of the odds of an event (not odds ratios). From this it is straightforward to recover the probability (or risk, if you prefer that label) of the event.
Don't understand your aversion to logistic regression to achieve what you want to achieve.
If you don't like the shape of the logistic, then any other cdf will provide you with a transformation to obey the constraints inherent in modeling a probability. The uniform distribution that you wish to use has to be curtailed, as others have pointed out.

Constantine Daskalakis wrote:

No argument about logistic regression. But that gives you odds ratios. What if you want risk differences instead?

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index