Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Using ivregress when the endogenous variable is used in an interaction term in the main regression

From	Christine Scheef <[email protected]>
To	[email protected]
Subject	Re: st: Using ivregress when the endogenous variable is used in an interaction term in the main regression
Date	Thu, 22 Dec 2011 20:19:36 +0100
Hey,

I am following your discussion since I am working on a similar problem at 
the moment. However, my endogenous variable in the interaction is binary. 
I have 3 questions:
- Since the endogenous variable is binary, is it right to use logit 
instead of regress in the first stage?
- I also want to calculate a 3-way interaction with 2 continuous exogenous 
variables and the endogenous binary variable. Can I form the interactions 
of X2hat with X1 and X3, that is X2hat* X1* X2?
- When calculating the thrid step with ivregress - Do I still need to 
check for the exogeneity of the instrument variables X2hat and X2hat*X1? 

I very much appreciate your help.

Best,
Christine


Nick,

I don't have a specific reference in mind, but I suppose you should be
able to construct a workable explanation from Prof. Wooldridge's reply
(indeed, you can probably directly cite it) and some directed
googling.

T

On Wed, Dec 21, 2011 at 10:01 AM, Nick Kohn <[email protected]> 
wrote:
> My apologies for spamming but I also wanted to mention that I'm trying
> out the specification that includes the endogenous variables as stand
> alone terms.
>
> I'm not sure whether I'll use it in my paper though because I'll need
> to provide a justification of why I deviate from the paper I cite, and
> going into long winded econometric arguments is beyond the scope of
> what I'm doing.
>
> Is there a paper or book I can cite that explains why adding the
> levels is appropriate?
>
> On Wed, Dec 21, 2011 at 6:58 PM, Nick Kohn <[email protected]> 
wrote:
>> Sorry for the confusion - X1 is included as a stand alone term.
>>
>> To be more detailed, my model looks like this (X is exogenous, E is 
endogenous):
>>
>> dY = X1 + X2
>>     + X1*X3
>>     + X1*X3*E1
>>     + X1*X3*E2
>>     + X1*X3*E3
>>     + controls
>>
>> X3 is an indicator variable that is equal to 1 when X1 <= 0
>>
>> On Wed, Dec 21, 2011 at 6:44 PM, Austin Nichols 
<[email protected]> wrote:
>>> Tirthankar Chakravarty <[email protected]>:
>>> I don't see anywhere that the X1 is included as a main effect as
>>> opposed to just being included in the product X1*X2.  (Though it is
>>> not clear what is included in "+controls" in the post.) It seems that
>>> X1 is exogenous by assumption, i.e. X1 is uncorrelated with e while X2
>>> is correlated with e. There are no quadratic terms in Z in my
>>> suggestion. Note that you suggested instrumenting with X2hat*X1 and
>>> X2hat is linear in Z.
>>>
>>> On Wed, Dec 21, 2011 at 12:15 PM, Tirthankar Chakravarty
>>> <[email protected]> wrote:
>>>> " It does not seem too much of a stretch to assume Z*X1
>>>> uncorrelated with e as well (which implies X2hat*X1 uncorrelated with
>>>> e)"
>>>>
>>>> This part is the problem. When you form cross-products of the
>>>> instrument matrix, you will end up with quadratic terms in Z, coming
>>>> from terms like the one you mention, which will need to be
>>>> uncorrelated with the structural errors, hence the independence
>>>> requirement.
>>>>
>>>> Again, note that X1 is included so there is no overidentification 
(or,
>>>> at best, the same degree of overidentification as without the
>>>> interaction term).
>>>>
>>>> T
>>>>
>>>> On Wed, Dec 21, 2011 at 8:57 AM, Austin Nichols 
<[email protected]> wrote:
>>>>> Tirthankar Chakravarty <[email protected]>:
>>>>> No conditional independence assumed, though of course an 
independence
>>>>> assumption lets you form all kinds of transformations of Z to use as
>>>>> excluded instruments.
>>>>>
>>>>> We need Z, Z*X1, and X1 uncorrelated with e, but Z and e were 
already
>>>>> assumed uncorrelated and X1 is exogenous by assumption as well, in 
the
>>>>> original post.  It does not seem too much of a stretch to assume 
Z*X1
>>>>> uncorrelated with e as well (which implies X2hat*X1 uncorrelated 
with
>>>>> e), but if we use all 3 as instruments we will see evidence of any
>>>>> violations of assumptions in the overid test (assuming no weak
>>>>> instruments problem).
>>>>>
>>>>> On Wed, Dec 21, 2011 at 11:44 AM, Tirthankar Chakravarty
>>>>> <[email protected]> wrote:
>>>>>> Austin,
>>>>>>
>>>>>> I agree re: well-cited papers.
>>>>>>
>>>>>> Note that the efficiency you mention comes at a cost. As I pointed 
out
>>>>>> in my previous Statalist reply:
>>>>>> http://www.stata.com/statalist/archive/2011-08/msg01496.html
>>>>>> the instrumenting strategy you suggest requires the instruments to 
be
>>>>>> conditionally independent rather than just uncorrelated with the
>>>>>> structural errors.
>>>>>>
>>>>>> T
>>>>>>
>>>>>> On Wed, Dec 21, 2011 at 7:57 AM, Austin Nichols 
<[email protected]> wrote:
>>>>>>> Nick Kohn <[email protected]>:
>>>>>>> Or better, instrument for X1*X2 using Z, Z*X1, and X1.
>>>>>>> For maximal efficiency given your assumptions you may prefer
>>>>>>> to instrument for X1*X2 using Z*X1, or even
>>>>>>> to instrument for X1*X2 using X2hat*X1,
>>>>>>> but you should build in an overid test whenever feasible.
>>>>>>>
>>>>>>> Just because a well-cited paper does something wrong does not mean 
you
>>>>>>> have to, though.
>>>>>>>
>>>>>>> Including the main effects of X1 and X2 makes for harder 
interpretation, but
>>>>>>> will make you a lot more confident of your answers once you have 
worked out the
>>>>>>> interpretation.
>>>>>>>
>>>>>>> On Wed, Dec 21, 2011 at 9:20 AM, Tirthankar Chakravarty
>>>>>>> <[email protected]> wrote:
>>>>>>>> In that case, none of this is necessary. Just instrument for 
X1*X2
>>>>>>>> using Z. All standard results apply.
>>>>>>>>
>>>>>>>> T
>>>>>>>>
>>>>>>>> On Wed, Dec 21, 2011 at 6:03 AM, Nick Kohn 
<[email protected]> wrote:
>>>>>>>>> Hmmm I see what you mean, but I'm following the methodology of a 
well
>>>>>>>>> cited paper that does the same thing.
>>>>>>>>>
>>>>>>>>> I'll be sure to discuss this limitation, but in terms of using 
this
>>>>>>>>> model, would the 3 steps in my last message be correct?
>>>>>>>>>
>>>>>>>>> On Wed, Dec 21, 2011 at 2:56 PM, Tirthankar Chakravarty
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>> I wanted to indirectly confirm that you did have the main 
effect in
>>>>>>>>>> the regression because even though I don't know the nature of 
your
>>>>>>>>>> study, a hard-to-defend methodological position arises when you
>>>>>>>>>> include interaction terms without including the main effect. 
You might
>>>>>>>>>> want to take that on the authority of someone who (literally) 
wrote
>>>>>>>>>> the book on the subject:
>>>>>>>>>>
>>>>>>>>>> http://www.stata.com/statalist/archive/2011-03/msg00188.html
>>>>>>>>>>
>>>>>>>>>> and reconsider your decision to not include the main effect.
>>>>>>>>>>
>>>>>>>>>> T
>>>>>>>>>>
>>>>>>>>>> On Wed, Dec 21, 2011 at 5:46 AM, Nick Kohn 
<[email protected]> wrote:
>>>>>>>>>>> My model doesn't have X2 as a separate term, so in terms of 
the model
>>>>>>>>>>> you had it looks like:
>>>>>>>>>>>  Y = b*X1*X2 + controls
>>>>>>>>>>> So the only place the endogenous variable comes up is the 
interaction term
>>>>>>>>>>>
>>>>>>>>>>> At the risk of being repetitive, would these be the correct 
steps (so
>>>>>>>>>>> essentially only step 3 changes from what you said):
>>>>>>>>>>> 1) regress X2 on all instruments, exogenous variables and 
controls
>>>>>>>>>>> 2) Form interactions of X2hat with the exogenous variable X1, 
that is, X2hat*X1
>>>>>>>>>>> 3) ivregress instrumenting for X2*X1 using X2hat*X1.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Dec 21, 2011 at 1:44 PM, Tirthankar Chakravarty
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>> Not quite; here is the recommended procedure (I am assuming 
that you
>>>>>>>>>>>> have the main effect of the endogenous variable in there as 
in Y =
>>>>>>>>>>>> a*X2 + b*X1*X2 + controls):
>>>>>>>>>>>>
>>>>>>>>>>>> 1) -regress- X2 on _all_ instruments (included exogenous 
controls and
>>>>>>>>>>>> excluded instruments) and get predictions X2hat.
>>>>>>>>>>>>
>>>>>>>>>>>> 2) Form interactions of X2hat with the exogenous variable X1, 
that is, X2hat*X1.
>>>>>>>>>>>>
>>>>>>>>>>>> 3) -ivregress- instrumenting for X2 and X2*X1 using X2hat and 
X2hat*X1.
>>>>>>>>>>>>
>>>>>>>>>>>> Note that there is distinction between two calls to -regress- 
and
>>>>>>>>>>>> using -ivregress- for 3).
>>>>>>>>>>>>
>>>>>>>>>>>> T
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Dec 21, 2011 at 3:43 AM, Nick Kohn 
<[email protected]> wrote:
>>>>>>>>>>>>> Thanks for the reply.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My simplified model is (X2 is endogenous):
>>>>>>>>>>>>> Y = b*X1*X2 + controls
>>>>>>>>>>>>>
>>>>>>>>>>>>> In regards to the third option you suggest, would I do the 
following?
>>>>>>>>>>>>>
>>>>>>>>>>>>>  1) First stage regression to get X2hat using the instrument 
Z
>>>>>>>>>>>>>  2) Run the first stage again but use X1*X2hat as the 
instrument for
>>>>>>>>>>>>> X1*X2 (so Z is no longer used)
>>>>>>>>>>>>>  3) Run the second stage using (X1*X2)hat (so the whole 
product is
>>>>>>>>>>>>> fitted from step 2))
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Dec 21, 2011 at 12:24 PM, Tirthankar Chakravarty
>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>> You can see my previous reply to a similar question here:
>>>>>>>>>>>>>> 
http://www.stata.com/statalist/archive/2011-08/msg01496.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> T
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Dec 21, 2011 at 2:24 AM, Nick Kohn 
<[email protected]> wrote:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have a specification in which the endogenous variable is 
interacted
>>>>>>>>>>>>>>> with an exogenous variable. Since I cannot multiply the 
variables
>>>>>>>>>>>>>>> directly in the regression, I create a new variable. In 
ivregress it
>>>>>>>>>>>>>>> makes no sense to use the entire interaction term as the 
endogenous
>>>>>>>>>>>>>>> variable.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I can do the first stage manually (and then use the fitted 
value in
>>>>>>>>>>>>>>> the main regression), however, from what I remember the 
standard
>>>>>>>>>>>>>>> errors will be wrong when doing it manually.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is there a way to overcome this?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Follow-Ups:
- Re: st: Using ivregress when the endogenous variable is used in an interaction term in the main regression
  - From: Tirthankar Chakravarty <[email protected]>
Prev by Date: st: Estimating GF-QUAIDS
Next by Date: st: Tabout including all categories
Previous by thread: Re: st: Using ivregress when the endogenous variable is used in an interaction term in the main regression
Next by thread: Re: st: Using ivregress when the endogenous variable is used in an interaction term in the main regression
Index(es):
- Date
- Thread