Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Using ivregress when the endogenous variable is used in an interaction term in the main regression

 From Nick Kohn To statalist@hsphsun2.harvard.edu Subject Re: st: Using ivregress when the endogenous variable is used in an interaction term in the main regression Date Wed, 21 Dec 2011 19:01:05 +0100

```My apologies for spamming but I also wanted to mention that I'm trying
out the specification that includes the endogenous variables as stand
alone terms.

I'm not sure whether I'll use it in my paper though because I'll need
to provide a justification of why I deviate from the paper I cite, and
going into long winded econometric arguments is beyond the scope of
what I'm doing.

Is there a paper or book I can cite that explains why adding the
levels is appropriate?

On Wed, Dec 21, 2011 at 6:58 PM, Nick Kohn <coffeemug.nick@gmail.com> wrote:
> Sorry for the confusion - X1 is included as a stand alone term.
>
> To be more detailed, my model looks like this (X is exogenous, E is endogenous):
>
> dY = X1 + X2
>     + X1*X3
>     + X1*X3*E1
>     + X1*X3*E2
>     + X1*X3*E3
>     + controls
>
> X3 is an indicator variable that is equal to 1 when X1 <= 0
>
> On Wed, Dec 21, 2011 at 6:44 PM, Austin Nichols <austinnichols@gmail.com> wrote:
>> Tirthankar Chakravarty <tirthankar.chakravarty@gmail.com>:
>> I don't see anywhere that the X1 is included as a main effect as
>> opposed to just being included in the product X1*X2.  (Though it is
>> not clear what is included in "+controls" in the post.) It seems that
>> X1 is exogenous by assumption, i.e. X1 is uncorrelated with e while X2
>> is correlated with e. There are no quadratic terms in Z in my
>> suggestion. Note that you suggested instrumenting with X2hat*X1 and
>> X2hat is linear in Z.
>>
>> On Wed, Dec 21, 2011 at 12:15 PM, Tirthankar Chakravarty
>> <tirthankar.chakravarty@gmail.com> wrote:
>>> " It does not seem too much of a stretch to assume Z*X1
>>> uncorrelated with e as well (which implies X2hat*X1 uncorrelated with
>>> e)"
>>>
>>> This part is the problem. When you form cross-products of the
>>> instrument matrix, you will end up with quadratic terms in Z, coming
>>> from terms like the one you mention, which will need to be
>>> uncorrelated with the structural errors, hence the independence
>>> requirement.
>>>
>>> Again, note that X1 is included so there is no overidentification (or,
>>> at best, the same degree of overidentification as without the
>>> interaction term).
>>>
>>> T
>>>
>>> On Wed, Dec 21, 2011 at 8:57 AM, Austin Nichols <austinnichols@gmail.com> wrote:
>>>> Tirthankar Chakravarty <tirthankar.chakravarty@gmail.com>:
>>>> No conditional independence assumed, though of course an independence
>>>> assumption lets you form all kinds of transformations of Z to use as
>>>> excluded instruments.
>>>>
>>>> We need Z, Z*X1, and X1 uncorrelated with e, but Z and e were already
>>>> assumed uncorrelated and X1 is exogenous by assumption as well, in the
>>>> original post.  It does not seem too much of a stretch to assume Z*X1
>>>> uncorrelated with e as well (which implies X2hat*X1 uncorrelated with
>>>> e), but if we use all 3 as instruments we will see evidence of any
>>>> violations of assumptions in the overid test (assuming no weak
>>>> instruments problem).
>>>>
>>>> On Wed, Dec 21, 2011 at 11:44 AM, Tirthankar Chakravarty
>>>> <tirthankar.chakravarty@gmail.com> wrote:
>>>>> Austin,
>>>>>
>>>>> I agree re: well-cited papers.
>>>>>
>>>>> Note that the efficiency you mention comes at a cost. As I pointed out
>>>>> in my previous Statalist reply:
>>>>> http://www.stata.com/statalist/archive/2011-08/msg01496.html
>>>>> the instrumenting strategy you suggest requires the instruments to be
>>>>> conditionally independent rather than just uncorrelated with the
>>>>> structural errors.
>>>>>
>>>>> T
>>>>>
>>>>> On Wed, Dec 21, 2011 at 7:57 AM, Austin Nichols <austinnichols@gmail.com> wrote:
>>>>>> Nick Kohn <coffeemug.nick@gmail.com>:
>>>>>> Or better, instrument for X1*X2 using Z, Z*X1, and X1.
>>>>>> For maximal efficiency given your assumptions you may prefer
>>>>>> to instrument for X1*X2 using Z*X1, or even
>>>>>> to instrument for X1*X2 using X2hat*X1,
>>>>>> but you should build in an overid test whenever feasible.
>>>>>>
>>>>>> Just because a well-cited paper does something wrong does not mean you
>>>>>> have to, though.
>>>>>>
>>>>>> Including the main effects of X1 and X2 makes for harder interpretation, but
>>>>>> will make you a lot more confident of your answers once you have worked out the
>>>>>> interpretation.
>>>>>>
>>>>>> On Wed, Dec 21, 2011 at 9:20 AM, Tirthankar Chakravarty
>>>>>> <tirthankar.chakravarty@gmail.com> wrote:
>>>>>>> In that case, none of this is necessary. Just instrument for X1*X2
>>>>>>> using Z. All standard results apply.
>>>>>>>
>>>>>>> T
>>>>>>>
>>>>>>> On Wed, Dec 21, 2011 at 6:03 AM, Nick Kohn <coffeemug.nick@gmail.com> wrote:
>>>>>>>> Hmmm I see what you mean, but I'm following the methodology of a well
>>>>>>>> cited paper that does the same thing.
>>>>>>>>
>>>>>>>> I'll be sure to discuss this limitation, but in terms of using this
>>>>>>>> model, would the 3 steps in my last message be correct?
>>>>>>>>
>>>>>>>> On Wed, Dec 21, 2011 at 2:56 PM, Tirthankar Chakravarty
>>>>>>>> <tirthankar.chakravarty@gmail.com> wrote:
>>>>>>>>> I wanted to indirectly confirm that you did have the main effect in
>>>>>>>>> the regression because even though I don't know the nature of your
>>>>>>>>> study, a hard-to-defend methodological position arises when you
>>>>>>>>> include interaction terms without including the main effect. You might
>>>>>>>>> want to take that on the authority of someone who (literally) wrote
>>>>>>>>> the book on the subject:
>>>>>>>>>
>>>>>>>>> http://www.stata.com/statalist/archive/2011-03/msg00188.html
>>>>>>>>>
>>>>>>>>> and reconsider your decision to not include the main effect.
>>>>>>>>>
>>>>>>>>> T
>>>>>>>>>
>>>>>>>>> On Wed, Dec 21, 2011 at 5:46 AM, Nick Kohn <coffeemug.nick@gmail.com> wrote:
>>>>>>>>>> My model doesn't have X2 as a separate term, so in terms of the model
>>>>>>>>>> you had it looks like:
>>>>>>>>>>  Y = b*X1*X2 + controls
>>>>>>>>>> So the only place the endogenous variable comes up is the interaction term
>>>>>>>>>>
>>>>>>>>>> At the risk of being repetitive, would these be the correct steps (so
>>>>>>>>>> essentially only step 3 changes from what you said):
>>>>>>>>>> 1) regress X2 on all instruments, exogenous variables and controls
>>>>>>>>>> 2) Form interactions of X2hat with the exogenous variable X1, that is, X2hat*X1
>>>>>>>>>> 3) ivregress instrumenting for X2*X1 using X2hat*X1.
>>>>>>>>>>
>>>>>>>>>> On Wed, Dec 21, 2011 at 1:44 PM, Tirthankar Chakravarty
>>>>>>>>>> <tirthankar.chakravarty@gmail.com> wrote:
>>>>>>>>>>> Not quite; here is the recommended procedure (I am assuming that you
>>>>>>>>>>> have the main effect of the endogenous variable in there as in Y =
>>>>>>>>>>> a*X2 + b*X1*X2 + controls):
>>>>>>>>>>>
>>>>>>>>>>> 1) -regress- X2 on _all_ instruments (included exogenous controls and
>>>>>>>>>>> excluded instruments) and get predictions X2hat.
>>>>>>>>>>>
>>>>>>>>>>> 2) Form interactions of X2hat with the exogenous variable X1, that is, X2hat*X1.
>>>>>>>>>>>
>>>>>>>>>>> 3) -ivregress- instrumenting for X2 and X2*X1 using X2hat and X2hat*X1.
>>>>>>>>>>>
>>>>>>>>>>> Note that there is distinction between two calls to -regress- and
>>>>>>>>>>> using -ivregress- for 3).
>>>>>>>>>>>
>>>>>>>>>>> T
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Dec 21, 2011 at 3:43 AM, Nick Kohn <coffeemug.nick@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> My simplified model is (X2 is endogenous):
>>>>>>>>>>>> Y = b*X1*X2 + controls
>>>>>>>>>>>>
>>>>>>>>>>>> In regards to the third option you suggest, would I do the following?
>>>>>>>>>>>>
>>>>>>>>>>>>  1) First stage regression to get X2hat using the instrument Z
>>>>>>>>>>>>  2) Run the first stage again but use X1*X2hat as the instrument for
>>>>>>>>>>>> X1*X2 (so Z is no longer used)
>>>>>>>>>>>>  3) Run the second stage using (X1*X2)hat (so the whole product is
>>>>>>>>>>>> fitted from step 2))
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Dec 21, 2011 at 12:24 PM, Tirthankar Chakravarty
>>>>>>>>>>>> <tirthankar.chakravarty@gmail.com> wrote:
>>>>>>>>>>>>> You can see my previous reply to a similar question here:
>>>>>>>>>>>>> http://www.stata.com/statalist/archive/2011-08/msg01496.html
>>>>>>>>>>>>>
>>>>>>>>>>>>> T
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Dec 21, 2011 at 2:24 AM, Nick Kohn <coffeemug.nick@gmail.com> wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have a specification in which the endogenous variable is interacted
>>>>>>>>>>>>>> with an exogenous variable. Since I cannot multiply the variables
>>>>>>>>>>>>>> directly in the regression, I create a new variable. In ivregress it
>>>>>>>>>>>>>> makes no sense to use the entire interaction term as the endogenous
>>>>>>>>>>>>>> variable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I can do the first stage manually (and then use the fitted value in
>>>>>>>>>>>>>> the main regression), however, from what I remember the standard
>>>>>>>>>>>>>> errors will be wrong when doing it manually.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there a way to overcome this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```