Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Confusion about collinearity

From   Richard Williams <>
To, "" <>
Subject   Re: st: Confusion about collinearity
Date   Mon, 02 Dec 2013 01:15:07 -0500

See comments below.

At 12:40 AM 12/2/2013, Yarbrough, Kevin T CADET MIL USA USMA wrote:
I'm having a problem with collinearity in my difference-in-differences model. I'm using Stata 12 for Windows.

I'm using pooled cross section from a 5% sample of ACS data from 2011 to 2011 across all 50 states and Washington D.C. I'm attempting to analyze a policy implementation with the difference-in-difference(DID) identification method. 25 states have implemented this policy in different years. I'm trying to measure the effect on income. From my knowledge, a correct use of DID should result in a model: y=B0+B1(Treatment_group)+B2(post-policy year)+ B3(interaction term of treatment_group and post-policy year).

I began by creating a dummy variable for each year that equals 1 if a state had the program in that year. For example:

gen eitc00=1 if year==2000 & (statefip==08 |statefip==11 | statefip==17 |statefip==19 | statefip==20 | statefip==23 | statefip==24 |statefip==25 |statefip==27 |statefip==34 |statefip==36 |statefip==41 | statefip==44 |statefip==50 |statefip==55)

I followed that up with this code to replace the missing values:

replace eitc00=0 if year==2000 & eitc00==.

If I am reading this code right, then eitc00 still = missing for all years other than 2000. That means the regression will only include data from 2000. Is this really what you want? How about this instead?

replace eitc00 = 0 if eitc00 == .

I then created a dummy variable for years:

gen yr02=year==2002

In 2000, yr02 = 0.

And the interaction term between the two:

gen eitc00yr02=eitc00*yr02

In 2000, the interaction term = 0. In all other years it is missing since eitc00 is missing for all years besides 2000.

My regression is

Reg lnincwage eitc00 yr02 eitc00yr02

In 2000, yr02 and eitc00yr02 both equal 0. All other years get dropped because eitc00 is missing for all other years.

Which results in the interaction term being omitted because of collinearity. I cannot figure out why.

I'm surprised only the interaction gets dropped.

I may be making a mistake somewhere. But have you run descriptive statistics on your created vars?

As a sidelight I would use factor variable notation for my interactions and dummy variables.

Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  Richard.A.Williams.5@ND.Edu

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index