Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: problem with factor variable and margins.

 From vwiggins@stata.com (Vince Wiggins, StataCorp) To statalist@hsphsun2.harvard.edu Subject Re: st: problem with factor variable and margins. Date Wed, 17 Mar 2010 14:55:34 -0500

```Rich Steinberg <rsteinbe@iupui.edu> has found an inconsistency in the way the
-dydx()- option of -margins- handles the varlist that it takes as its
argument.  -dydx()- is the option that specifies the covariates in the model
for which you want marginal effects computed.  It currently fails to identify
factor variables in that list when there are other variables in the model that
begin with the name of the factor variable.

Rich's example is a relatively complicated -tobit- regression, but we can
reproduce the problem with the auto dataset and -regress-.  If we type,

. regress mpg i.foreign
. margins, dydx(foreign)

-margins- reports the marginal effect (or more correctly the discrete
difference effect) of -foreign- on our response.

If, however, we add a covariate that begins with "foreign" to our model, for
example

. gen foreign_tr = foreign*trunk
. regress mpg i.foreign foreign_tr

Now, typing,

. margins, dydx(foreign)

does NOT produce the marginal effect for -foreign-.  It produces only the
marginal effect for -foreign_tr-.  The variable foreign_tr causes -margins- to
ignore the factor variable -foreign-.

This is a mechanical parsing problem, and we will fix this behavior in a
future update.

Until the problem is fixed, there are two easy workarounds.

Rich is taking advantage of the fact that the -dydx()- option of -margins-
treats the base name of factor variables as though you typed the full list of
level variables which the factor represents.  So, when, we type,

. margins, dydx(foreign)

Stata sees

. margins, dydx(1.foreign)

This shorthand notation generalizes to factor variables with multiple levels.

. margins, dydx(rep78)

is equivalent to

. margins, dydx(2.rep78 3.rep78 4.rep78 5.rep78)

Until we have improved the parsing, Rich can explicitly specify the levels of
the factors.  For example,

. margins, dydx(1.foreign)

If Rich is willing to tolerate seeing marginal effects for all of his
covariates, he can also work around the parsing problem by requesting marginal
effects for all of his covariates,

. margins, dydx(*)

Phil Ender <ender97@gmail.com> raises a more substantive issue,

> I don't think the problem is with -margins- but with the fact that
> you created the interaction outside of your model.  If you create
> the interaction using the factor variables in the model then
> -margins- not only identifies all of the terms but produces marginal
> effects that take into account the interactions.  Without
> reproducing you entire model, it would look something like this:
>
> . tobit dv i.naic c.ainc i.naic#c.ainc ...

Rich is currently creating some interactions in his model by creating
variables for this interactions prior to estimating his model.  This is what
we did above with the variable -foreign_tr-.

. regress mpg i.foreign foreign_tr trunk
. margins, dydx(1.foreign)

If, however, we estimate an equivalent regression model, using factor-variable
notation to create the interactions,

. regress mpg i.foreign##c.trunk

then request the marginal effect of -foreign-,

. margins, dydx(1.foreign)

We asking -margins- to estimate two very different things.  Here are the
results from the first estimation,

Average marginal effects                          Number of obs   =         74
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : 1.foreign

------------------------------------------------------------------------------
|            Delta-method
|      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.foreign |  -.5136233   4.389711    -0.12   0.907    -9.117299    8.090052
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

And, here are the results from the second,

. margins, dydx(1.foreign)

Average marginal effects                          Number of obs   =         74
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : 1.foreign

------------------------------------------------------------------------------
|            Delta-method
|      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.foreign |   3.116336   1.401444     2.22   0.026     .3695569    5.863116
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

As Phil noted, after our first estimation, -margins- does not, and cannot,
know that there is a relationship between the variables -foreign- and
-foreign_tr-.  -foreign_tr- is just another covariate in the model and
-margins- computes the marginal effect of -foreign- pretending -foreign_tr- is
just any other covariate in the model.  After our second estimation, however,
-margins- knows about the interaction between -foreign- and -trunk-.  Because
-margins- knows about the interaction, it can compute the total effect of
-foreign- on the response, including the effect of the interaction.

-- Vince
vwiggins@stata.com

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```