Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Joerg Luedicke <joerg.luedicke@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Margins in STATA 12 - how to use at () for dichotomous variables? |
Date | Wed, 19 Dec 2012 20:36:10 -0500 |
On Wed, Dec 19, 2012 at 12:08 PM, Neeraj Iyer <iyeranna@gmail.com> wrote: > 1) When gender is held at value=1, the marginal effect of x1 is for > either all men or all women. Yes, depending on what is coded as 1. > 2) When gender is held at its mean, the marginal effect of x1 is > controlling for gender based on the proportion of those coded as '1'. I would not use the term "controlling" here. What it simply means is that you plug in the mean of the gender variable into the regression equation when doing the predictions. Say you have 50% males and 50% females, then - strictly speaking - you would get marginal effects for people who are half male and half female. > 3) When gender is averaged over men and women, the marginal effect of > x1 is the arithmetic mean of the marginal effect of men and that of > women. Not quite. It is essentially a weighted sum of the predictions for males and those for females, where the predictions for males are weighted with the proportion of males and the predictions for females are weighted with the proportion of females. To see this, we can look at an example. In order to simplify things, lets say we have two binary predictors, say treatment and gender, and want to predict counts for both treatment arms, averaged over males and females. First we generate some data: //Some toy data (I use some modified code from Joseph Hilbe //(http://www.stata.com/statalist/archive/2011-06/msg00183.html) clear set obs `=1e4' set seed 1234 gen treat = runiform()>.5 gen gender = runiform()>.5 //Count component gen xb = 2 + 0.5*treat - 0.5*gender gen a = .5 gen ia = 1/a gen exb = exp(xb) gen xg = rgamma(ia, a) gen xbg = exb * xg gen nby = rpoisson(xbg) //Binary component gen pi =1/(1+exp(-(0.5*treat + 0.5*gender + 0.2))) gen bernoulli = runiform()>pi gen zy = bernoulli*nby rename zy y * --> fit the model: //Model fit zinb y i.treat gender, inflate(i.treat gender) * --> ...and compute marginal counts for both treatment arms for gender=0, gender=1: //Manual calculation of marginal counts *marginal counts for control and treatment, for gender=0 margins i.treat, at(gender=0) gen cou_cont_g0=exp(2.0276) gen cou_trt_g0=exp(2.0276 + .4674933) gen p_cont_g0 = 1 - ( exp(.2195245) / (1+exp(.2195245) ) ) gen p_trt_g0 = 1 - ( exp(.2195245 + .4923645) / (1+exp(.2195245 + .4923645) ) ) gen cont_g0 = cou_cont_g0 * p_cont_g0 gen trt_g0 = cou_trt_g0 * p_trt_g0 sum cont_g0 trt_g0 *marginal counts for control and treatment, for gender=1 margins i.treat, at(gender=1) gen cou_cont_g1=exp(2.0276 - .4888359) gen cou_trt_g1=exp(2.0276 + .4674933 - .4888359) gen p_cont_g1 = 1 - ( exp(.2195245 + .4601969) / (1+exp(.2195245 + .4601969) ) ) gen p_trt_g1 = 1 - ( exp(.2195245 + .4923645 + .4601969) / (1+exp(.2195245 + .4923645 + .4601969) ) ) gen cont_g1 = cou_cont_g1 * p_cont_g1 gen trt_g1 = cou_trt_g1 * p_trt_g1 sum cont_g1 trt_g1 * --> Finally, we compute averaged marginal counts as a weighted sum of the predictions for both gender categories: *marginal counts for control and treatment, averaged over gender margins i.treat sum gender gen cont = (cont_g0 * (1-r(mean)) + cont_g1 * r(mean)) gen trt = (trt_g0 * (1-r(mean)) + trt_g1 * r(mean)) sum cont trt Note that the same principle applies to average marginal effects (as opposed to marginal counts). In the above example with a binary variable, the marginal effects are simply the differences between marginal counts for the two treatment arms. For example, margins, dydx(treat) would correspond to: margins, dydx(treat) at(gender=0) margins, dydx(treat) at(gender=1) sum gender di .607908 * (1-r(mean)) + .191445 * r(mean) However, I doubt that average marginal effects are a good choice for continuous predictors in non-linear models, especially so in a model like the ZINB. Consider the following example with a zero-inflated outcome and one continuous predictor: clear set obs `=1e4' set seed 1234 gen x = rnormal() //Count component gen xb = 2 + 0.2*x gen a = .5 gen ia = 1/a gen exb = exp(xb) gen xg = rgamma(ia, a) gen xbg = exb * xg gen nby = rpoisson(xbg) //Binary component gen pi =1/(1+exp(-(0.5*x + 0.2))) gen bernoulli = runiform()>pi gen zy = bernoulli*nby rename zy y //Model fit zinb y x, inflate(x) The average marginal effect of x would be: margins, dydx(x) which indicates that per unit increase in x, we expect y to decrease by roughly a fifth of a count, i.e. we would assume a linear effect. However, if we visualize the effect of x on y by plotting fitted values of y predict yhat as a function of x: line yhat x, sort we can see that the effect of x on y is all but linear in this example and that the average marginal effect is a poor representation of this effect. Average marginal effects of this kind should probably be avoided altogether in non-linear models. If you would be interested in the relation of x on y at certain values of age, you could do several predictions and change the value for age in each of the predictions. The same strategy would work if you had an age*x interaction effect. Joerg On Wed, Dec 19, 2012 at 12:08 PM, Neeraj Iyer <iyeranna@gmail.com> wrote: > Thank you Mr. Luedicke. I ran the codes that you sent. Here is my > understanding of the output: > > 1) When gender is held at value=1, the marginal effect of x1 is for > either all men or all women. > 2) When gender is held at its mean, the marginal effect of x1 is > controlling for gender based on the proportion of those coded as '1'. > 3) When gender is averaged over men and women, the marginal effect of > x1 is the arithmetic mean of the marginal effect of men and that of > women. > > I reckon from a substantive standpoint, 1) or 3) are more informative > than 2). If I were interested in the marginal effect of x1 at 10 year > intervals of age, would it then be prudent to use 3)? > > Thank you, > > Regards, > Neeraj > > On Fri, Dec 14, 2012 at 11:05 AM, Joerg Luedicke > <joerg.luedicke@gmail.com> wrote: >> >> clear >> set obs `=1e4' >> set seed 1234 >> gen x1 = rnormal() >> gen gender = runiform()>.5 >> >> //Count component >> gen xb = 2 + 0.5*x1 - 0.5*gender >> gen a = .5 >> gen ia = 1/a >> gen exb = exp(xb) >> gen xg = rgamma(ia, a) >> gen xbg = exb * xg >> gen nby = rpoisson(xbg) >> >> //Binary component >> gen pi =1/(1+exp(-(0.5*x1 + 0.5*gender + 0.2))) >> gen bernoulli = runiform()>pi >> gen zy = bernoulli*nby >> rename zy y >> >> //Model fit >> zinb y x1 gender, inflate(x1 gender) >> >> //OP's -margins- statements >> margins, dydx(x1) at(gender=1) >> margins, dydx(x1) at((mean) gender) >> >> //Now consider the following which is similar to your second statement: >> sum gender >> margins, dydx(x1) at(gender=.4983) >> >> //And compare it to the following, where the marginal effects are >> averaged over men and women: >> margins, dydx(x1) > > > > > -- > Neeraj * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/