Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: Stata logit interaction


From   Maarten buis <[email protected]>
To   stata list <[email protected]>
Subject   st: Re: Stata logit interaction
Date   Tue, 26 Apr 2011 15:40:04 +0100 (BST)

--- On Sat, 23/4/11, Z.L. Deng wrote me privately:
> I've read through your Stata Journal paper (2010) regarding how to
> interpret interaction terms in logit models. You mentioned in the
> final paragraph "I used in this tip a relatively simple example with
> only binary variables and no control variables. However, the basic
> argument still holds when using continuous variables and when control
> variables are added." However I'm a junior logit modeller and have two
> queries which need your kind help:
> 
> (1) I don't know how to judge if an interaction between two continuous
> variables have a significant effect. For example 
> Y(0/1) = a0+a1*X1+a2*X2+a3*X1*X2+.... 
> . logit y x1 x2 x1_x2 ...
> How could I judge if X1*X2 has a significant effect? Could you please
> offer me the subsequent commands as you did in your Stata Journal
> paper, e.g.
> . margins , over(black collgrad) expression(exp(xb())) post
> . lincom 0.black#1.collgrad - 0.black#0.collgrad
> . lincom 1.black#1.collgrad - 1.black#0.collgrad
> 
> (2) Some scholars argue that by centering X1 and X2, we can 
> allieviate multicolinearity problem. So, I wanted to try 
> Y(0/1) = a0+a1*X1+a2*X2+a3*(X1-mean(X1))*(X2-mean(X2))+.... 
> . logit y x1 x2 x1centered_x2centered ...
>
> If your method still applicable here? As far as I tried, the command
> -inteff- command which was proposed by Norton et al (2004), failed in 
> that case.

Ziliang:

I forwarded this answer to the Statalist as this type of question
pops up so every now and then there. I also forwarded it to 
Edward Norton as part of the question involves his -inteff- program
and he is obviously much more an expert on that than I am.

To answer your first question, can I give an example of a continuous
by continuous interaction in a logit model with other control
variables, consider the example below. 

As always I start with the baseline odds (a convenient trick to 
introduce/refresh the readers memory on what an odds and an odds 
ratio is). Within the group persons with an average education 
(grade), experience (ttl_exp), and age, who are white widowed or 
divorced and living in the south we expect to find .46 persons
with a good job for ever person with a bad job. 

If one gets a year more education this odds changes by a ratio of 
1.25, i.e. it increases with 25%. Similarly a decade increase in
experience (notice that in the data preparation stage I divided 
ttl_exp by 10) leads to a 160% increase in the odds of getting a 
good job. 

The interaction effect says that the effect of education decreases 
by a factor .90, i.e. -10%, when one gets a decade more experience.
The test noted next to that coefficient is the test of the null-
hypothesis that this factor by which the effect of education changes
equals 1, i.e. the "change" equals 0%. As such it can be meaningfully
interpreted as a test of one operationalization of the interaction 
effect. In this case the interaction effect is negative and (just)
significant at the 5% level.

Norton et al. (2004) focus on a different operationalization of the 
interaction effect, in terms of marginal effects rather than odds
ratios. These different operationalizations can lead to apparently
very different and even opposite conclusions. In my Stata tip, Buis
(2010), I tried to make the point that this is the result of whether
or not you want to control for the baseline odds or probability.

*-------------------- begin example ----------------------
//================================= data preparation

// load data
sysuse nlsw88, clear

/* Categories 1 & 2 are classified as 1, i.e. 
   "good occupations"
the rest is classified as 0, i.e. "bad occupation"

The categories for occupation are:
           1 Professional/technical
           2 Managers/admin
           3 Sales
           4 Clerical/unskilled
           5 Craftsmen
           6 Operatives
           7 Transport
           8 Laborers
           9 Farmers
          10 Farm laborers
          11 Service
          12 Household workers
          13 Other

*/
gen byte good_occ = occupation < 3 ///
    if occupation < .

// marital status is present in the data
// as two dummy variables, these are
// combined into one categorical variable
// marst so it will work more nicely with
// Stata's new factor variable notation   
gen byte marst = never_married + 2*married
label define marst 0 "widowed/divorced" ///
                   1 "never married"    ///
                   2 "married"
label value marst marst
label variable marst "marital status"

// a trick to report the baseline odds, see
// <http://www.maartenbuis.nl/example_faq/index.html#baseline>
gen byte baseline = 1               

// center variables
sum grade, meanonly
gen c_grade = grade - r(mean)

sum ttl_exp, meanonly
gen c_ttl_exp = (ttl_exp - r(mean))/10

sum age, meanonly
gen c_age = age - r(mean)

//=============================== estimate the model               
logit good_occ c.c_grade##c.c_ttl_exp        ///
      i.race i.south c_age i.marst baseline, ///
      nocons or
*------------------------ end example ----------------------
(For more on examples I sent to the Statalist see: 
http://www.maartenbuis.nl/example_faq )

As to your second question, I can use -inteff- with centered 
variables as you can see in the example below. One thing I 
can imagine is that you have an old version of -inteff-. It
appears that the most current version of that program can
be obtained from <http://www.unc.edu/~enorton/>.

*----------------------- begin example ----------------------
// load data
sysuse nlsw88, clear

gen byte good_occ = occupation < 3 ///
    if occupation < .

// center variables
sum grade, meanonly
gen c_grade = grade - r(mean)

sum ttl_exp, meanonly
gen c_ttl_exp = (ttl_exp - r(mean))/10

sum age, meanonly
gen c_age = age - r(mean)

// -inteff- does not (yet) recognize Stata's new
// factor variable notation, so we need to make 
// our own dummies and interactions	
gen c_gradeXc_ttl = c_grade*c_ttl
gen black = race == 2 if race < .
gen other = race == 3 if race < .

// use inteff
logit good_occ c_grade c_ttl_exp c_gradeXc_ttl ///
      black other south c_age married never_married
inteff good_occ c_grade c_ttl_exp c_gradeXc_ttl ///
       black other south c_age married never_married
*------------------------- end example --------------------
(For more on examples I sent to the Statalist see: 
http://www.maartenbuis.nl/example_faq )

Hope this helps,
Maarten

References:
Maarten L. Buis (2010) Stata tip 87: Interpretation of interactions
in non-linear models. The Stata Journal, 10(2), 305--308. 

Edward C. Norton, Hua Wang, Chunrong Ai (2004) Computing interaction 
effects and standard errors in logit and probit models. The Stata
Journal, 4(2):154--167.

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index