# st: model-based standardization

 From "Garth Rauscher" <[email protected]> To <[email protected]> Subject st: model-based standardization Date Mon, 4 Feb 2008 22:11:39 -0600

```[I tried to send this message to the listserv a few days ago but don't think
it made it through so I am trying again. I apologize if this is a duplicate
message.]

Dear listserve members

I am attempting to learn how to perform a model-based standardization with
Stata, using the marginal or predictive margins method.  I would like to be
able to estimate standardized probabilities and probability differences from
logistic regression that are standardized to the distribution of modeled
covariates. The idea is summarized in: "Greenland S. Model-based estimation
of relative risks and other epidemiologic measures in studies of common
outcomes and in case-control studies. Am J Epidemiol 2004;160:301-305." To
the best of my understanding, the method involves estimating predictied
probabilities of Y under two scenarios (e.g. x=1 and x=0). Assuming we have
a dependent variable Y(0,1), an exposure of interest X(0,1), and covariates
r2 r3 a2 a3 a4 a5, two sets of predicted probabiltiies could be:

P0(x) based on the joint distribution of covariates, with X=0 assigned to
everyone
P1(x) based on the joint distribution of covariates, with X=1 assigned to
everyone
PD(x) as the difference in probabilities,  P1(x) - P0(x)

Below is my code.

logit Y X r2 r3 a2 a3 a4 a5

// predicted xbetas after assigning all observations to X=0

g if0=_b[_cons]+_b[X]*0+_b[r2]+_b[r3]+_b[a2]+_b[a3]+_b[a4]+_b[a5]

// predicted xbetas after assigning all observations to X=1

g if1=_b[_cons]+_b[X]*1+_b[r2]+_b[r3]+_b[a2]+_b[a3]+_b[a4]+_b[a5]

// predicted probabilities

g p0x = invlogit(if0)
g p1x = invlogit(if1)

groups p0x p1x

+---------------------------------------+
|      p0x        p1x   Freq.   Percent |
|---------------------------------------|
| .1368349   .4431594     444    100.00 |
+---------------------------------------+

I was expecting two new variables of predictied probabilities, p0x and p1x
with a range of values that depended on covariates. However, I noticed that
p0x and p1x each had only one value instead of a range of values as I had
expected (see above).  Any clarification as to what I am doing incorrectly
would be appreciated. I think my next task would have been to perform
bootstrapping to get confidence intervals from the distribution of means for
p0x, p1x and PD(x).

Garth Rauscher
Division of Epid/Bios (M/C 923)
UIC School of Public Health
1603 West Taylor Street
Chicago, IL 60612
ph: (312)413-4317
fx:  (312)996-0064
em: [email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```