[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Garth Rauscher" <garthr@uic.edu> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: Binomial regression |

Date |
Sat, 4 Aug 2007 09:34:53 -0500 |

When dealing with prevalent outcomes (mammography use, smoking, etc), I greatly prefer the interpretation of risk difference over risk ratio or (and epecially) odd ratio, since the OR loses all meaning as an estimate of risk ratio as the outcome prevalence increases. In these situations I tend to think in terms of the percentage-point change in the outcome (the risk or probability difference) associated with an intervention or exposure. Perhaps the best way to estimate an "adjusted" probability difference is to estimate a standardied risk difference. Stata is the only package I am aware of that has built in routines to estimate RD standardised either to the distribution of covariates in the unexposed portion of your sample, or the unexposed portion of your sample. And it's a simple matter to also estimate RD standardised to the distribution of covariates in the entire sample. All model assumptions are absent from this method. Your dependent variable (Y) and primary expoure variable (X) must be binary. A downside is that any continuous independent variables you want to control for have to be categorised in such a way such that every possible combination of x, y and all other covariates must be represented with data. For example, if you have only one binary covariate (C) to control for, then you must have data present in all 8 possible combinations of x,y and c. Below is a program to estimate probability (risk or prevalence) difference standardised to the distribution of covariates in the entire sample. I had to gerrymander the sysuse auto dataset in order to get enough obs in each category, since it's a tiny dataset and the covariates were strongly related to Y (foreign) /* gerrymandering--for example purpose only!!! sysuse auto, clear cd "c:\datasets" save autonew, replace sysuse auto, clear // quadruple observations- for example purposes only!!! append using autonew append using autonew append using autonew gen turn2=. replace turn2= 0 if turn <35 replace turn2= 1 if turn >=35 replace turn2= . if turn >99 gen price2=. replace price2=0 if price <= 4453 replace price2=1 if price > 4453 replace price2=. if price > 99999 groups foreign turn2 price2, nolabel save autonew, replace *--------------------------------------------------------------------------- ------------------------------ // local x = your binary exposure var (0,1) // local y = your binary dependent var (0,1) // local z = your standardization vars local x = "price2" local y = "foreign" local z = "turn2" marktouse touse `z' /* These two commands tell stata not to include obs with missing data */ keep if touse==1 /* on the standardization variables, o/w it will not work */ groups `y' `x' `z' // Check: May not work if data are chopped too finely sort `z' /* These two commands create the standardization weights */ by `z': egen wgt=sum(_n) cs `y' `x', by(`z') standard(wgt) rd *--------------------------------------------------------------------------- ------------------------------ Garth Rauscher Assistant Professor SPH Epidemiology and Biostatistics University of Illinois at Chicago 1603 W Taylor St (M/C/ 923) Chicago, IL 60612 (312) 413-4317 phone (312) 996-0064 fax garthr@uic.edu email On 8/3/2007 11:35 Constantine Daskalakis wrote: > Hey Marcello. > > Shouldn't you be on vacation? :) > > Certainly, you can recover the absolute risks for each covariate pattern > (or observation), but what about the risk DIFFERENCE? > > The point is this: > > If the Xs are additive on the log-odds scale, we can fit a logistic > regression and report a single OR for each X as a summary measure. > Additivity on the log-odds scale implies non-additivity on the > probability scale, so there is no single RD for X (depends on the actual > value of X). > > But suppose the Xs are (more) additive on the original probability scale > rather than the log-odds scale. Then, it would be best to report a > single RD for each X as a summary measure of effect (rather than a > single OR, which is not really appropriate in this situation). Now, from > a logistic regression without interactions, we get a single OR for each > X, but an infinite number of RDs for the same X. That's not very good. > > On the other hand, we can retrieve a single summary RD from a > main-effects-only binomial regression model. And, by the way, a logistic > regression would need a bunch of interactions to have comparable fit to > this binomial regression, and would still not provide us with a single > summary RD. > > There's also causal inference stuff regarding causal interpretations for > RD, but not for RR or OR. So, that's additional motivation for focusing > on the RD. > > Bottom line: The goal here (get summary RD) CANNOT be achieved via > logistic regression. > > Finally, there's nothing about a "uniform distribution" here. It's just > a generalized linear model with the identity link -- a different model > for how the risk changes as a function of covariates (and the same > binomial error structure that logistic regression uses). There's no > substantive reason to prefer one link over another. Why would the risk > follow the logistic function rather than any other curve (including a > line)? The correct/appropriate link function will depend on the data at > hand. Sometimes it will be a line, sometimes a logistic, sometimes some > other unknown beast. > > I think that we are getting to the point where logistic has become so > ingrained that many people think of it (unconsciously?) as "have > screwdriver, will always use screwdriver (with binary outcome)." The > logit is the canonical link function for the Bernoulli/binomial and > implementation of regression with the logit link is easier than anything > else. But that's just ease of programming and custom. > > For history buffs, this goes back to Cornfield (Bull Int Statist Inst, > 1961) who 'tricked' programs designed for discriminant analysis to do > logistic regression and to Walker & Duncan (Biometrics, 1967) who looked > at the maximum likelihood approach. At the time, lack of computing > resources made such things impractical for other types of regressions > for binary outcomes. But that's half a century ago. CD On 8/2/2007 10:45 PM, Marcello Pagano wrote: > Sorry to disagree with your first sentence, Constantine. > Logistic regression stipulates a linear relationship of covariates with > the log of the odds of an event (not odds ratios). From this it is > straightforward to recover the probability (or risk, if you prefer that > label) of the event. > Don't understand your aversion to logistic regression to achieve what > you want to achieve. > If you don't like the shape of the logistic, then any other cdf will > provide you with a transformation to obey the constraints inherent in > modeling a probability. The uniform distribution that you wish to use > has to be curtailed, as others have pointed out. > m.p. > > > Constantine Daskalakis wrote: >> No argument about logistic regression. But that gives you odds ratios. >> What if you want risk differences instead? >> > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: A gentler introduction to Statalist and Seven Deadly Sins** - Next by Date:
**Re: st: Binomial regression** - Previous by thread:
**Re: st: Binomial regression** - Next by thread:
**Re: st: Binomial regression** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |