Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Arne Risa Hole <arnehole@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Obtaining marginal effects and their standard errors after |

Date |
Wed, 9 Jan 2013 10:43:35 +0000 |

Dear Vince, Thanks for posting this. I found it very illuminating, in particular the clever uses of the contrast features of -margins-. Best wishes, Arne On 8 January 2013 23:50, Vince Wiggins, StataCorp <vwiggins@stata.com> wrote: > Arne Risa Hole <arnehole@gmail.com> and Richard Williams > <richardwilliams.ndu@gmail.com> have had an illuminating exchange about > the computation and meaning of interaction effects on the probability > of a positive outcome in models with a binary response. The discussion > applies to any response that is not a linear combination of the > coefficients, but let's stick with probabilities. I have a few related > thoughts and also want to show off some of -margins- lesser known > features using Arne's clever examples. > > Richard wonders "why margins does not provide marginal effects for > interactions". We have nothing against so called "interaction > effects", though as Richard notes they are a funny kind of effect. You > cannot change an interaction directly, you can only change its > constituent pieces. (Ergo, why "interaction effects" are so called.) > You can, however, interpret an interaction, and as Arne notes, that > interpretation is just the change in the slope of one variable as the > other variable itself changes, > > d(y) > interaction = ---------- > d(x1)d(x2) > > What I will dub "own interactions", an interaction of a variable with > itself, have a long history in physics. The slope of a time and > distance being velocity, > > d(distance) > velocity = ----------- > d(time) > > and, the interaction with time itself being acceleration, > > d(distance) d(distance) > acceleration = -------------- = ----------- > d(time)d(time) d^2(time) > > An "own interaction" does not have the problem that we are required to > think of changing the interaction itself. There is only one variable > to change. Moreover, we rarely have such nice descriptions of our > interactions, own or otherwise. When we regress mileage on weight and > weight squared, we are simply admitting that a linear relationship > doesn't match the data, and we need some flexibility in the > relationship between mileage and weight. We do not think that weight > squared has its own interpretation. > > In such cases, I am a fan of visualizing the relationships over a range > of meaningful values, rather than trying to create a single number that > summarizes the "interaction effect". We know that the effects differ > for different levels of the interacted variables and for different > levels of other variables. Best to admit this and evaluate the > response at different points. As Richard points out, "the problem with > any `average' number (AME or MEM) is that it disguises a great deal of > individual level variability ... That is why I like MERs (marginal > effects at representative values), or else APRs (a term I made up) > which stands for Adjusted Predictions at Representative Values." Me > too. > > Richard's slides on using -margins- in this context should be required > reading, > > http://www.nd.edu/~rwilliam/stats/Margins01.pdf > > as should his Stata Journal article, > > http://www.statajournal.com/article.html?article=st0260 > > If you are trying to test whether an interaction term in your model is > statistically significant, do that in the metric in which you estimated > the model. That is to say, look at the test statistics on the > interaction term. > > One thing to keep in mind is that with a nonlinear response (e.g., > probabilities in a probit or logit model) you have in interaction > effect between your covariates even when you do not have an interaction > term in the model. The probability is an S-shaped response in Xb, so, > as any covariate changes, it pushes the the response of the other > covariates into either one of the tails, where the response is > attenuated, or toward the the center, where the response is > strengthened. > > Try this example > > . webuse margex > . probit outcome age distance > . margins, dydx(age) at(distance=(0(100)800)) > . marginsplot > > We estimated a model with no interaction, yet when we graph the > marginal effect of age over a range of distances, we find a strong > downward trend in the change in probability for a change in age as > distance increases. > > Even more fun, try this example, > > . clear > . set seed 12345 > . set obs 5000 > > . gen x = runiform() - .5 > . gen z = runiform() - .5 > . gen xb = x + 8*z > . gen y = 1 / (1 + exp(xb)) < uniform() > > . logit y x z > > . margins, dydx(x) at(z=(-.5(.1).5)) > . marginsplot > > Again, we have no interaction term in the model, but plenty of > "interaction effect" on the probability. The marginal effect of x on > probability traces out a nice bell-shaped curve as z increases. The > marginal effect of x on probability first rises as z rises, then peaks > and falls as z continues to rise. The "interaction" is pronounced, the > marginal effect rising from near 0 to about .25, then falling back to > 0. > > Despite this pronounced "interaction", if we were to compute the > average "interaction effect", it would be 0 (at least asymptotically). > It is 0 because the positive and negative interactions sum to 0 in this > example. This is directly analygous to the well-worn example of > fitting a linear model to quadratic data and finding no relationship. > That is why I do not like to talk about continuous-continuous > "interaction effects" as a single value. I would rather explore the > MEMs or APRs. > > These graph are as we would expect. Logit (and probit) probabilities > look like, > > pr = f(Xb) > > where f() is a monotonically increasing function of xb that asymptotes > to 0 as xb -> -infinity and asymptotes to 1 as xb -> +infinity. That > is to say it is an S-shaped function in Xb. > > If z is a covariate in the set of covariates X, then, > > marginal effect of z = d(pr)/d(z) = d(pr)/d(Xb) * d(Xb)/d(z) > > So, every marginal effect also includes a contribution from all > other covariates in the model (the X in Xb). In fact d(pr)/d(Xb) will > always map out the bell-shaped curve over a sufficient range of Xb. > So, all logit and probit models have an interaction by construction, > even when we do not introduce interaction terms. > > These built-in interactions from nonlinear responses lie at the heart > of Ai and Norton's (2003) protracted explorations of interactions. > > These nonlinearities do not exist in the natural metric of the model. > If we think of the response of the probit model as being a one-standard > deviation change in the latent response (index value if you prefer > GLM), then we have no nonlinearities, and we can directly interpret our > coefficients. The case is even more compelling for logistic models, > where the parameter estimates can be expressed as odds ratios that do > not change as the levels of other variables change. Maarten Buis has > championed this approach many times on the list, e.g., > > http://www.stata.com/statalist/archive/2010-08/msg00968.html > > with reference to an associated Stata Journal article, > > http://www.maartenbuis.nl/publications/interactions.html > > Even so, changes in probability, or another nonlinear response, can > often be useful in characterizing a model. And, you say, you still > want an "interaction effect" on a nonlinear response. -margins- can > directly compute these effects for any number of interactions of > indicator or factor-variable covariates and for interactions of those > with a continuous covariates. It cannot directly compute the effects > of continuous-continuous interactions. Given what we have seen above, > I contend that continuous-continuous interactions are the least useful > interactions and those most likely to obscure important relationships. > > That said, Arne has shown how to creatively use -margins- to > numerically compute the pieces of a continuous-continuous interaction, > and then assemble the interaction yourself. I have a simplification of > Arne's example for those wanting the effects computed at the means of > the covariates. > > Set up the dataset, and run the probit model > > . sysuse auto, clear > . replace weight=weight/1000 > . replace length=length/10 > . probit foreign weight length c.weight#c.length, nolog > > Rather than, > > . margins, dydx(*) atmeans at(weight=3.019559) > . matrix b = r(b) > . scalar meff_turn_1 = b[1,2] > > . margins, dydx(*) atmeans at(weight=3.019459) > . matrix b = r(b) > . scalar meff_turn_0 = b[1,2] > > . di (meff_turn_1 - meff_turn_0) / 0.0001 > > you could use the -margins- contrast operator to take the > difference between the marginal effect for the two values of > weight, > > . margins, dydx(length) atmeans at(weight=3.019459) at(weight=3.019559) > contrast(atcontrast(r._at)) post > . margins, coeflegend > . nlcom _b[r2vs1._at] / .0001 > > One tricky part of the -margins- command is -at(weight=3.019459) > at(weight=3.019559)-. We are simply evaluating the derivative > -dydx(length)- at the mean of weight and at the mean of weight plus a > small epsilon, so we can numerically take the cross derivative w.r.t. > weight. A second tricky part is -contrast(atcontrast(r._at))-. We are > asking for the contrast (difference) in the two at() values we > specified for weight. We use the -post- option of -margins- to post > the results as estimation results, then use -nlcom- to divide by our > epsilon. > > I typed -margins, coeflegend- only because we would never know that we > need to refer to the estimated difference as _b[r2vs1._at] without that > legend. The simplified technique has the added benefit of providing > confidence intervals on the estimate. > > Given that we know the exact form of the probability, we would still > get the most accurate results using the method described in the FAQ > that led to the original question in this thread, > > http://www.stata.com/support/faqs/statistics/marginal-effects-after-interactions/ > > Although I agree with Arne that the numerical example using -margins- > is mostly pedagogical, I admit that in the dark ages, before -margins- > existed, I regularly performed such computations. With a little > sensitivity testing of the epsilon used to compute the derivative > (.0001 above), these can be accurate estimates. > > We can use Arne's example of a continuous-factor interaction to show > how to estimate the "interaction effect" using only -margins-. I am > again showing Arne's full example, because it makes clear what > -margins- is computing. > > Set up the dataset, and run the probit model > > . sysuse auto, clear > . set seed 12345 > . generate dum=uniform()>0.5 > . probit foreign turn i.dum i.dum#c.turn, nolog > > Rather than, > > . margins, dydx(*) atmeans at(dum=1) > . matrix b = r(b) > . scalar meff_turn_dum1 = b[1,1] > > . margins, dydx(*) atmeans at(dum=0) > . matrix b = r(b) > . scalar meff_turn_dum0 = b[1,1] > > . di meff_turn_dum1 - meff_turn_dum0 > > use -margins-' contrast operator to compute the interaction. > > . margins r.dum, dydx(turn) atmeans > > With this approach, we can remove the -atmeans- option and estimate the > average "interaction effect", rather than the "interaction effect" at > the means, > > . margins r.dum, dydx(turn) > > These "interaction effects" do not bother me in the same way a > continous-continuous "interaction effect" does. Why? Because there > are only two values for the variable dum. That means we have > completely explored the interaction space of the two variables dum and > turn. It does not mean that we have explored how the marginal effect > of turn varies with its own values or those of other covariates in the > model, and that is why I would still look at the the MERs and APRs. > > Factor-factor "interaction effects" can also be estimated using the > contrast operators. > > For model, > > . logit A##B ... > > type, > > . margins r.A#r.B > > > to estimate the average "interaction effect" > > or, to estimate the "interaction effect" at the means, type > > . margins r.A#r.B, atmeans > > > This naturally extends to multiway interactions, > > . logit A##B##C ... > > . margins r.A#r.B#r.C > > Again, these "interaction effects" do not bother me in the way > continuous-continuous interactions do. With factor variables, the > interactions are exploring the complete space of results. Even so, I > still like to look at the margins (estimated means), > > . margins A#B#C > > It has been my experience that the contrast operators and other > contrast features added to -margins- in Stata 12 have gone largely > unnoticed. I am glad Arne's examples provided the a platform to > demonstrate what they do. > > > > -- Vince > vwiggins@stata.com > > > > Ai, C. R. and E. C. Norton. 2003. Interaction terms in logit and probit > models. Economics Letters 80(1): 123-129. > > Buis, Maarten L. Stata tip 87: Interpretation of interactions in nonlinear > models. The Stata Journal (2010) Vol. 10 No. 2, pp. 305-308. > > Williams, R. Using the margins command to estimate and interpret adjusted > predictions and marginal effects. The Stata Journal (2012) Vol. 12 No. > 2, pp 308-331. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Obtaining marginal effects and their standard errors after***From:*vwiggins@stata.com (Vince Wiggins, StataCorp)

- Prev by Date:
**st: svmat?** - Next by Date:
**Re: st: svmat?** - Previous by thread:
**Re: st: Obtaining marginal effects and their standard errors after** - Next by thread:
**st: question: an example of xtmixed: testing differences between groups at specific times** - Index(es):