Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Obtaining marginal effects and their standard errors after


From   [email protected] (Vince Wiggins, StataCorp)
To   [email protected]
Subject   Re: st: Obtaining marginal effects and their standard errors after
Date   Tue, 08 Jan 2013 17:50:12 -0600

Arne Risa Hole <[email protected]> and Richard Williams
<[email protected]> have had an illuminating exchange about
the computation and meaning of interaction effects on the probability
of a positive outcome in models with a binary response.  The discussion
applies to any response that is not a linear combination of the
coefficients, but let's stick with probabilities.  I have a few related
thoughts and also want to show off some of -margins- lesser known
features using Arne's clever examples.

Richard wonders "why margins does not provide marginal effects for
interactions".  We have nothing against so called "interaction
effects", though as Richard notes they are a funny kind of effect.  You
cannot change an interaction directly, you can only change its
constituent pieces.  (Ergo, why "interaction effects" are so called.)
You can, however, interpret an interaction, and as Arne notes, that
interpretation is just the change in the slope of one variable as the
other variable itself changes,

	                d(y)  
	interaction = ----------
	              d(x1)d(x2)

What I will dub "own interactions", an interaction of a variable with
itself, have a long history in physics.  The slope of a time and
distance being velocity,

	           d(distance)
	velocity = -----------
	             d(time)

and, the interaction with time itself being acceleration,

	                d(distance)     d(distance)
	acceleration = -------------- = -----------
	               d(time)d(time)    d^2(time)

An "own interaction" does not have the problem that we are required to
think of changing the interaction itself.  There is only one variable
to change.  Moreover, we rarely have such nice descriptions of our
interactions, own or otherwise.  When we regress mileage on weight and
weight squared, we are simply admitting that a linear relationship
doesn't match the data, and we need some flexibility in the
relationship between mileage and weight.  We do not think that weight
squared has its own interpretation.

In such cases, I am a fan of visualizing the relationships over a range
of meaningful values, rather than trying to create a single number that
summarizes the "interaction effect".  We know that the effects differ
for different levels of the interacted variables and for different
levels of other variables.  Best to admit this and evaluate the
response at different points.  As Richard points out, "the problem with
any `average' number (AME or MEM) is that it disguises a great deal of
individual level variability ...  That is why I like MERs (marginal
effects at representative values), or else APRs (a term I made up)
which stands for Adjusted Predictions at Representative Values."  Me
too.

Richard's slides on using -margins- in this context should be required
reading,

    http://www.nd.edu/~rwilliam/stats/Margins01.pdf

as should his Stata Journal article, 

    http://www.statajournal.com/article.html?article=st0260

If you are trying to test whether an interaction term in your model is
statistically significant, do that in the metric in which you estimated
the model.  That is to say, look at the test statistics on the
interaction term.

One thing to keep in mind is that with a nonlinear response (e.g.,
probabilities in a probit or logit model) you have in interaction
effect between your covariates even when you do not have an interaction
term in the model.  The probability is an S-shaped response in Xb, so,
as any covariate changes, it pushes the the response of the other
covariates into either one of the tails, where the response is
attenuated, or toward the the center, where the response is
strengthened.

Try this example

    . webuse margex
    . probit outcome age distance
    . margins, dydx(age) at(distance=(0(100)800))
    . marginsplot

We estimated a model with no interaction, yet when we graph the
marginal effect of age over a range of distances, we find a strong
downward trend in the change in probability for a change in age as
distance increases.

Even more fun, try this example,

    . clear
    . set seed 12345
    . set obs 5000

    . gen x = runiform() - .5
    . gen z = runiform() - .5
    . gen xb = x + 8*z
    . gen y = 1 / (1 + exp(xb)) < uniform()

    . logit y x z

    . margins, dydx(x) at(z=(-.5(.1).5))
    . marginsplot

Again, we have no interaction term in the model, but plenty of
"interaction effect" on the probability.  The marginal effect of x on
probability traces out a nice bell-shaped curve as z increases.  The
marginal effect of x on probability first rises as z rises, then peaks
and falls as z continues to rise.  The "interaction" is pronounced, the
marginal effect rising from near 0 to about .25, then falling back to
0.

Despite this pronounced "interaction", if we were to compute the
average "interaction effect", it would be 0 (at least asymptotically).
It is 0 because the positive and negative interactions sum to 0 in this
example.  This is directly analygous to the well-worn example of
fitting a linear model to quadratic data and finding no relationship.
That is why I do not like to talk about continuous-continuous
"interaction effects" as a single value.  I would rather explore the
MEMs or APRs.

These graph are as we would expect.  Logit (and probit) probabilities
look like,

     pr = f(Xb)

where f() is a monotonically increasing function of xb that asymptotes
to 0 as xb -> -infinity and asymptotes to 1 as xb -> +infinity.  That
is to say it is an S-shaped function in Xb.

If z is a covariate in the set of covariates X, then,

     marginal effect of z = d(pr)/d(z) = d(pr)/d(Xb) * d(Xb)/d(z)

So, every marginal effect also includes a contribution from all
other covariates in the model (the X in Xb).  In fact d(pr)/d(Xb) will
always map out the bell-shaped curve over a sufficient range of Xb.
So, all logit and probit models have an interaction by construction,
even when we do not introduce interaction terms.

These built-in interactions from nonlinear responses lie at the heart
of Ai and Norton's (2003) protracted explorations of interactions.

These nonlinearities do not exist in the natural metric of the model.
If we think of the response of the probit model as being a one-standard
deviation change in the latent response (index value if you prefer
GLM), then we have no nonlinearities, and we can directly interpret our
coefficients.  The case is even more compelling for logistic models,
where the parameter estimates can be expressed as odds ratios that do
not change as the levels of other variables change.  Maarten Buis has
championed this approach many times on the list, e.g.,

    http://www.stata.com/statalist/archive/2010-08/msg00968.html

with reference to an associated Stata Journal article, 

    http://www.maartenbuis.nl/publications/interactions.html

Even so, changes in probability, or another nonlinear response, can
often be useful in characterizing a model.  And, you say, you still
want an "interaction effect" on a nonlinear response.  -margins- can
directly compute these effects for any number of interactions of
indicator or factor-variable covariates and for interactions of those
with a continuous covariates.  It cannot directly compute the effects
of continuous-continuous interactions.  Given what we have seen above,
I contend that continuous-continuous interactions are the least useful
interactions and those most likely to obscure important relationships.

That said, Arne has shown how to creatively use -margins- to
numerically compute the pieces of a continuous-continuous interaction,
and then assemble the interaction yourself.  I have a simplification of
Arne's example for those wanting the effects computed at the means of
the covariates.

    Set up the dataset, and run the probit model

	. sysuse auto, clear
	. replace weight=weight/1000
	. replace length=length/10
	. probit foreign weight length c.weight#c.length, nolog

    Rather than, 

	. margins, dydx(*) atmeans at(weight=3.019559)
	. matrix b = r(b)
	. scalar meff_turn_1 = b[1,2]

	. margins, dydx(*) atmeans at(weight=3.019459)
	. matrix b = r(b)
	. scalar meff_turn_0 = b[1,2]

	. di (meff_turn_1 - meff_turn_0) / 0.0001

     you could use the -margins- contrast operator to take the
     difference between the marginal effect for the two values of
     weight,

	. margins, dydx(length) atmeans at(weight=3.019459) at(weight=3.019559)
	           contrast(atcontrast(r._at)) post
	. margins, coeflegend
	. nlcom _b[r2vs1._at] / .0001
    
One tricky part of the -margins- command is -at(weight=3.019459)
at(weight=3.019559)-.  We are simply evaluating the derivative
-dydx(length)- at the mean of weight and at the mean of weight plus a
small epsilon, so we can numerically take the cross derivative w.r.t.
weight.  A second tricky part is -contrast(atcontrast(r._at))-.  We are
asking for the contrast (difference) in the two at() values we
specified for weight.  We use the -post- option of -margins- to post
the results as estimation results, then use -nlcom- to divide by our
epsilon.

I typed -margins, coeflegend- only because we would never know that we
need to refer to the estimated difference as _b[r2vs1._at] without that
legend.  The simplified technique has the added benefit of providing
confidence intervals on the estimate.

Given that we know the exact form of the probability, we would still
get the most accurate results using the method described in the FAQ
that led to the original question in this thread,

    http://www.stata.com/support/faqs/statistics/marginal-effects-after-interactions/

Although I agree with Arne that the numerical example using -margins-
is mostly pedagogical, I admit that in the dark ages, before -margins-
existed, I regularly performed such computations.  With a little
sensitivity testing of the epsilon used to compute the derivative
(.0001 above), these can be accurate estimates.

We can use Arne's example of a continuous-factor interaction to show
how to estimate the "interaction effect" using only -margins-.  I am
again showing Arne's full example, because it makes clear what
-margins- is computing.

    Set up the dataset, and run the probit model

	. sysuse auto, clear
	. set seed 12345
	. generate dum=uniform()>0.5
	. probit foreign turn i.dum i.dum#c.turn, nolog

    Rather than, 

	. margins, dydx(*) atmeans at(dum=1)
	. matrix b = r(b)
	. scalar meff_turn_dum1 = b[1,1]

	. margins, dydx(*) atmeans at(dum=0)
	. matrix b = r(b)
	. scalar meff_turn_dum0 = b[1,1]

	. di meff_turn_dum1 - meff_turn_dum0

    use -margins-' contrast operator to compute the interaction.

	. margins r.dum, dydx(turn) atmeans

With this approach, we can remove the -atmeans- option and estimate the
average "interaction effect", rather than the "interaction effect" at
the means,

	. margins r.dum, dydx(turn)

These "interaction effects" do not bother me in the same way a
continous-continuous "interaction effect" does.  Why?  Because there
are only two values for the variable dum.  That means we have
completely explored the interaction space of the two variables dum and
turn.  It does not mean that we have explored how the marginal effect
of turn varies with its own values or those of other covariates in the
model, and that is why I would still look at the the MERs and APRs.

Factor-factor "interaction effects" can also be estimated using the
contrast operators.

    For model,

	. logit A##B ...

    type, 

	. margins r.A#r.B


    to estimate the average "interaction effect"

    or, to estimate the "interaction effect" at the means, type

	. margins r.A#r.B, atmeans


This naturally extends to multiway interactions, 

	. logit A##B##C ...

	. margins r.A#r.B#r.C

Again, these "interaction effects" do not bother me in the way
continuous-continuous interactions do.  With factor variables, the
interactions are exploring the complete space of results.  Even so, I
still like to look at the margins (estimated means),

	. margins A#B#C

It has been my experience that the contrast operators and other
contrast features added to -margins- in Stata 12 have gone largely
unnoticed. I am glad Arne's examples provided the a platform to
demonstrate what they do.


 
-- Vince 
   [email protected]



Ai, C. R. and E. C. Norton. 2003. Interaction terms in logit and probit
    models. Economics Letters 80(1): 123-129.

Buis, Maarten L. Stata tip 87: Interpretation of interactions in nonlinear
    models.  The Stata Journal (2010) Vol. 10 No. 2, pp. 305-308.

Williams, R.  Using the margins command to estimate and interpret adjusted
    predictions and marginal effects.  The Stata Journal (2012) Vol. 12 No.
    2, pp 308-331.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index