Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Confusion about collinearity
From
Richard Williams <[email protected]>
To
[email protected], "[email protected]" <[email protected]>
Subject
Re: st: Confusion about collinearity
Date
Mon, 02 Dec 2013 01:15:07 -0500
See comments below.
At 12:40 AM 12/2/2013, Yarbrough, Kevin T CADET MIL USA USMA wrote:
I'm having a problem with collinearity in my
difference-in-differences model. I'm using Stata 12 for Windows.
I'm using pooled cross section from a 5% sample of ACS data from
2011 to 2011 across all 50 states and Washington D.C. I'm attempting
to analyze a policy implementation with the
difference-in-difference(DID) identification method. 25 states have
implemented this policy in different years. I'm trying to measure
the effect on income. From my knowledge, a correct use of DID should
result in a model: y=B0+B1(Treatment_group)+B2(post-policy year)+
B3(interaction term of treatment_group and post-policy year).
I began by creating a dummy variable for each year that equals 1 if
a state had the program in that year. For example:
gen eitc00=1 if year==2000 & (statefip==08 |statefip==11 |
statefip==17 |statefip==19 | statefip==20 | statefip==23 |
statefip==24 |statefip==25 |statefip==27 |statefip==34 |statefip==36
|statefip==41 | statefip==44 |statefip==50 |statefip==55)
I followed that up with this code to replace the missing values:
replace eitc00=0 if year==2000 & eitc00==.
If I am reading this code right, then eitc00 still = missing for all
years other than 2000. That means the regression will only include
data from 2000. Is this really what you want? How about this instead?
replace eitc00 = 0 if eitc00 == .
I then created a dummy variable for years:
gen yr02=year==2002
In 2000, yr02 = 0.
And the interaction term between the two:
gen eitc00yr02=eitc00*yr02
In 2000, the interaction term = 0. In all other years it is missing
since eitc00 is missing for all years besides 2000.
My regression is
Reg lnincwage eitc00 yr02 eitc00yr02
In 2000, yr02 and eitc00yr02 both equal 0. All other years get
dropped because eitc00 is missing for all other years.
Which results in the interaction term being omitted because of
collinearity. I cannot figure out why.
I'm surprised only the interaction gets dropped.
I may be making a mistake somewhere. But have you run descriptive
statistics on your created vars?
As a sidelight I would use factor variable notation for my
interactions and dummy variables.
-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME: (574)289-5227
EMAIL: [email protected]
WWW: http://www.nd.edu/~rwilliam
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/