[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Lee Sieswerda <[email protected]> |

To |
"'Jan Brogger'" <[email protected]>, Statalist <[email protected]> |

Subject |
st: RE: adjust prevalence (solution) |

Date |
Fri, 30 Aug 2002 19:35:10 -0400 |

I think the issue is that the terms "adjustment" and "standardization" are a bit muddled up in the literature of different disciplines. Using Jan's last data set as an example: Jan wants to sex-adjust his asthma prevalence at time1 so that it is comparable with the asthma prevalence at time0, which has a different sex distribution. In epidemiology, we (stupidly) interchange the terms "standardized" and "adjusted". What Jan is looking for is standardization. But lets trace the problem from Jan's original attempt to use -adjust-. After running the logistic regression (logistic asthma sex time), we obtain the probabilities for the four possible combinations of predictors (i.e., covariate patterns) like so: gen prob = exp( -2.197225 + _b[sex]*sex + _b[time]*time)/(1+exp( -2.197225 +_b[sex]*sex + _b[time]*time)) or more simply, predict prob, pr tab prob prob | Freq. Percent Cum. ------------+----------------------------------- .1 | 1400 70.00 70.00 .3999999 | 600 30.00 100.00 ------------+----------------------------------- Total | 2000 100.00 With 4 covariate patterns, you would normally get four predicted probabilities, except that Jan has contrived an example where the sex-specific asthma rates are precisely the same at time0 and time1. Thus, you only get two predicted probabilities. To get the conditional probabilities, you can substitute 0's and 1's into the above equation. For example, for males at time0, you would type: di exp( -2.197225 + _b[sex]*1 + _b[time]*0)/(1+exp( -2.197225 +_b[sex]*1 + _b[time]*0)) .3999999 Now, from this equation, it becomes clear, why -adjust- starts giving results unexpected by someone who uses "standardize" interchangably with "adjust". But first, in epidemiological terms, to "adjust" for, say, sex using the direct standardization method, you multiply the standard sex distribution (the distribution at time0 in this case) by the sex-specific rates at time1, then add them together and divide by the total standard population: 500*.1 = 5 500*.4 = 200 (200+50)/1000 = .25 Thus, .25 is the new "sex-standardized" asthma rate at time1. Put another way, it is the asthma rate conditional on the sex distribution being the same at time1 as at time0. Now, consider what -adjust- does. Typing: adjust sex, by(time) calculates the probability of asthma at the *average* of sex at .3: di exp( -2.197225 + _b[sex]*.3 + _b[time]*0)/(1+exp( -2.197225 +_b[sex]*.3 + _b[time]*0)) di exp( -2.197225 + _b[sex]*.3 + _b[time]*1)/(1+exp( -2.197225 +_b[sex]*.3 + _b[time]*1)) giving a value of .15980265, which doesn't seem to correspond to anything useful at all in terms of the "standardization" definition of "adjustment". The same applies if you set sex=.5. However, this isn't surprising when you consider that the value of _b[sex] was calculated based on the total sample of 2000. That is not the slope that Jan wants. If you want to "adjust/standardize" the asthma rate at time1 to the standard sex population at time0, then maybe what you are really looking for is this: * Get the rate in the population to be standardized logistic asthma sex time if time==1 * Apply the slope/rate derived from the population at time=1 * to the standard population at time=0 predict prob tab prob if time==0 sum prob if time==0 This appears to work. But just to be sure, why don't we modify the data a little to remove the contrivance of equal sex-specific rates at time0 and time1 and check again: * NB: This probably won't work exactly the same for you. * It depends on how the data are sorted. But * what I'm doing is changing 7 values * of asthma at time==1 from 0 to 1 replace asthma = 1 in 1176 replace asthma = 1 in 1175 replace asthma = 1 in 1177 replace asthma = 1 in 1179 replace asthma = 1 in 1180 replace asthma = 1 in 1181 replace asthma = 1 in 1182 * Now estimate the model at time==1 * and then predict to the whole sample logistic asthma sex time if time==1 predict prob * Now apply the slope/rate from time1 to time0 sum prob if time==0 Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- yhat4 | 1000 .2538889 .1461842 .1077778 .4 * According to this, the sex-standardized rate at time1 * using the sex ratio at time0 will be .2538889 * We can double-check that this is correct using -dstdize- gen pop = 1 dstdize asthma pop sex, by(time) base(0) <snipped out most of output> Summary of Study Populations: time N Crude Adj_Rate Confidence Interval -------------------------------------------------------------------------- 0 1000 0.250000 0.250000 [ 0.224824, 0.275176] 1 1000 0.137000 0.253889 [ 0.204823, 0.302955] Notice that -dstdize- has produced exactly the same standardized rate (.253889) as the method using logistic regression above. Whew, this problem of Jan's has been bugging me for days. I'm glad to finally have a working solution. I'd love to see any other comments or solutions. Best, Lee Lee Sieswerda, Epidemiologist Thunder Bay District Health Unit 999 Balmoral Street Thunder Bay, Ontario Canada P7B 6E7 Tel: +1 (807) 625-5957 Fax: +1 (807) 623-2369 [email protected] www.tbdhu.com > -----Original Message----- > From: Jan Brogger [SMTP:[email protected]] > Sent: Wednesday, August 28, 2002 6:03 AM > To: Statalist > Cc: [email protected]; 'Lee Sieswerda'; 'VISINTAINER PAUL' > Subject: adjust prevalence (solution) > > If you have two populations with different prevalences of an outcome, > but that also have different prevalences of covariates (confounders) - > how do you compare these fairly? > > Logistic regression will adjust for the differences, and will give you > an adjusted odds ratio for the differences between populations. But how > about prevalences ? The traditional method is via standardization but > this cannot handle continuous variables which is a weakness. The > -adjust- command and variants will adjust the prevalences so that they > are comparable, but the population that it adjusts to is a not so simple > meta-population with funny values for covariates [shown in previous > post]. How to adjust to one of the populations ? > > Let prevA be the unadjusted prevalence in population A, and prevB be the > unadjusted prevalence in population B. We want prevalences adjusted to > population A. Naturally, the prevalence in population A is adjusted to > itself, and hence unchanged. But how to get the prevalence in population > B ? It seems intuitive to me that the right way is simply to use the > adjusted odds ratio. Convert the unadjusted prevalence in population B > to an odds, multiply by the adjusted odds ratio, and convert back to a > prevalence - the adjusted prevalence. This makes sense. If you really > believe that the adjusted odds ratio is the true odds ratio, due to the > removal of confounding by other covarites, then the adjusted prevalence > in population B represent that prevalence you would have observed if > population B had the same distribution of covariates as population A. > > This program will calculate adjusted prevalences with the above method. > It was a quick hack. Can anybody help in getting confidence intervals > from this program, too ? It needs documentation (short) but I don't > write SMCL. > > *! 1.0.0 Jan Brogger [email protected] 28aug2002 > capture program drop adjust2 > program define adjust2 , rclass > version 7.0 > syntax , by(varname) coeff(string) > preserve > > qui { > tempname eb matcoeff > tab `by' > if `r(r)'!=2 { > di as err "Error in adjust: by variable `by' > must have only two levels." > } > > *First, get the two levels of the -by- variable > summ `by' > local levelA `r(min)' > local levelB `r(max)' > > *Convert the target population prevalence to an odds > summ `e(depvar)' if `by'==`levelA' , meanonly > local prevA=`r(mean)' > local adjprevA=`prevA' > local oddsA=`prevA'/(1-`prevA') > > *Get the unadjusted prevalence of population B > summ `e(depvar)' if `by'==`levelB' , meanonly > local prevB=`r(mean)' > > *Get the odds ratio > matrix `eb'=e(b) > mat `matcoeff'=`eb'["y1","`coeff'"] > local coeff2=`matcoeff'[1,1] > local oddsratio=exp(`coeff2') > > *Adjust the odds with the odds ratio and convert back > local oddsB=`oddsA'*`oddsratio' > local adjprevB=`oddsB'/(`oddsB'+1) > > } > > *Display this nicely > > di as text "Unadjusted" _col(40) "Adjusted" > di "prevalences" _col(20) "Odds ratio" _col(40) "prevalences" > di as res "`levelA'" _col(10) "`levelB'" _col(40) "`levelA'" > _col(50) "`levelB'" > di as text _dup(55) "-" > di as res %4.3f `prevA' _col(10) %4.3f `prevB' _col(20) %4.3f > `oddsratio' /* > */ _col(40) %4.3f `adjprevA' _col(50) %4.3f `adjprevB' > > > return local unadjprevA = `prevA' > return local unadjprevB = `prevB' > return local oddsratio = `oddsratio' > return local adjprevA = `adjprevA' > return local adjprevB = `adjprevB' > > restore > end > > *Theoretical example > *The adjusted prevalences should be the same > *whereas the crude are not > *This is due to (extreme) confounding by gender > > clear > input time sex asthma freq > 0 0 0 450 > 0 0 1 50 > 0 1 0 300 > 0 1 1 200 > 1 0 0 810 > 1 0 1 90 > 1 1 0 60 > 1 1 1 40 > end > expand freq > tab sex time , col nofreq > bysort time: tab asthma sex , col nofreq > logistic asthma time sex > *adjust sex , by(time) pr format(%16.15f) > adjust2 , by(time) coeff(time) > ret li > > Yours sincerely, > > Jan Brogger, Institute of Medicine, University of Bergen, Norway * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: repeated measure anova - controlling for subject-level effects** - Next by Date:
**st: displaying lables in variable window** - Previous by thread:
**st: RE: adjust prevalence (solution)** - Next by thread:
**st: reg with xi and if exp** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |