[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: model-based standardization

From   "Austin Nichols" <>
Subject   Re: st: model-based standardization
Date   Tue, 5 Feb 2008 00:30:21 -0500

I haven't looked at the ref you cite, but what you want is one of the
several quantities that economists refer to as marginal effects (see
e.g. the description of -margfx- and SE calcs at

Your code fails because you have not multiplied estimated coefficients
by variables.
Try instead e.g.
logit Y X r2 r3 a2 a3
g p0=invlogit(_b[_cons]+ _b[r2]*r2+_b[r3]*r3+_b[a2]*a2+_b[a3]*a3)
g p1=invlogit(_b[_cons]+_b[X]+_b[r2]*r2+_b[r3]*r3+_b[a2]*a2+_b[a3]*a3)

but it's easier to use predict in any case:
ren X wasX
g X=0
predict pr0
replace X=1
predict pr1
drop X
ren wasX X

though you still have to make sure there are no neglected connections
between X and other variables (e.g. interactions).

To bootstrap, simply wrap it in a program:

prog dp
cap drop pr0 pr1 dp
logit Y X r2 r3 a2 a3
ren X wasX
g X=0
predict pr0
replace X=1
predict pr1
drop X
ren wasX X
g dp=pr1-pr0
mean pr0 pr1 dp
bs: dp

On Feb 4, 2008 11:11 PM, Garth Rauscher <> wrote:
> [I tried to send this message to the listserv a few days ago but don't think
> it made it through so I am trying again. I apologize if this is a duplicate
> message.]
> Dear listserve members
> I am attempting to learn how to perform a model-based standardization with
> Stata, using the marginal or predictive margins method.  I would like to be
> able to estimate standardized probabilities and probability differences from
> logistic regression that are standardized to the distribution of modeled
> covariates. The idea is summarized in: "Greenland S. Model-based estimation
> of relative risks and other epidemiologic measures in studies of common
> outcomes and in case-control studies. Am J Epidemiol 2004;160:301-305." To
> the best of my understanding, the method involves estimating predictied
> probabilities of Y under two scenarios (e.g. x=1 and x=0). Assuming we have
> a dependent variable Y(0,1), an exposure of interest X(0,1), and covariates
> r2 r3 a2 a3 a4 a5, two sets of predicted probabiltiies could be:
> P0(x) based on the joint distribution of covariates, with X=0 assigned to
> everyone
> P1(x) based on the joint distribution of covariates, with X=1 assigned to
> everyone
> PD(x) as the difference in probabilities,  P1(x) - P0(x)
> Below is my code.
> logit Y X r2 r3 a2 a3 a4 a5
> // predicted xbetas after assigning all observations to X=0
> g if0=_b[_cons]+_b[X]*0+_b[r2]+_b[r3]+_b[a2]+_b[a3]+_b[a4]+_b[a5]
> // predicted xbetas after assigning all observations to X=1
> g if1=_b[_cons]+_b[X]*1+_b[r2]+_b[r3]+_b[a2]+_b[a3]+_b[a4]+_b[a5]
> // predicted probabilities
>       g p0x = invlogit(if0)
>       g p1x = invlogit(if1)
> I was expecting two new variables of predictied probabilities, p0x and p1x
> with a range of values that depended on covariates. However, I noticed that
> p0x and p1x each had only one value instead of a range of values as I had
> expected (see above).  Any clarification as to what I am doing incorrectly
> would be appreciated. I think my next task would have been to perform
> bootstrapping to get confidence intervals from the distribution of means for
> p0x, p1x and PD(x).
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index