Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: st: RE: Too high R2, when interacting endogenous regressor in -ivreg2-?

From   Jen Zhen <>
Subject   Fwd: st: RE: Too high R2, when interacting endogenous regressor in -ivreg2-?
Date   Wed, 11 Dec 2013 10:08:26 +0100

Dear Mark,

thanks for your reply!

My data are essentially cross-sectional in that I observe both
outcomes and endogenous regressor at the individual level and observe
each individual only once, even though my instrument varies by year
and (mostly) county.
The plot does not reveal any outliers.

However, I realized that the following:
When running  --ivreg2 y ex excontrols (en en_ex = z z_ex)--
I get estimates for two first-stage regressions, whose point estimates
I can also reproduce as follows:
--reg en       z z_ex ex excontrols-
--reg en_ex z z_ex ex excontrols-
The huge R2 for the latter of those seems to come from the huuge R2 I
get on the subsample indicator, ex, on its own there. If I drop that
subsample indicator from my regressions, then the R2 reaches less
unreasonable levels (around 0.7), however conceptually I would have
thought that I should control for the direct difference between the
two subsamples, as I do not wish to misinterpret any level differences
between the two subsamples as differences in the marginal effects of
respectively z on en and of en on y?

Thank you and best regards,

On Tue, Dec 10, 2013 at 8:18 PM, Schaffer, Mark E <> wrote:
> JZ,
>> -----Original Message-----
>> From: [mailto:owner-
>>] On Behalf Of Jen Zhen
>> Sent: 10 December 2013 16:36
>> To:
>> Subject: st: Too high R2, when interacting endogenous regressor in -ivreg2-?
>> Dear Statalist members,
>> after running an -ivreg2- estimation, I wanted to test formally whether results
>> differ between 2 subsamples defined by the exogenous dummy "ex". I have
>> followed the procedure explained by Kit Baum in the earlier post by Jana von
>> Stein, Kit Baum and Vassilis Monastiriotis
>> and estimated the following equation (where I've added a long list of further
>> exogenous controls, excontrols):
>>  ivreg2 y ex excontrols (en en_ex = z z_ex)
>> This largely seems to work. I obtain two first stage equation outputs for the
>> outcomes en and en_ex respectively, each with both z and z_ex amongst the set
>> of regressors, plus ex and excontrols.
>> What troubles me though is that for the second first-stage regression, that for
>> en_ex, I get an R2 of 0.998. Despite having a long list of controls and good data
>> quality, that makes me wonder whether something is wrong here or how I
>> could explain this high R2?
> You need to tell us more about your data and setup.  Are you using time-series data?  You can sometimes get very high R2s in a time-series setting and nothing is actually wrong.
> But if you are using cross-section data, then you are probably right to be worried.  FWIW, my first guess would be that you have a huge outlier.  In 2-D space, a scatterplot would have all but one datapoint bunched closely together, and the outlier is way off in the distance somewhere.  The regression line is basically connecting the outlier to the rest of the datapoints (and connecting them almost perfectly in terms of squared residuals, though of course this is an illusion).
> HTH,
> Mark
>> Thank you so much and kind regards,
>> JZ
>> *
>> *   For searches and help try:
>> *
>> *
>> *
> -----
> Sunday Times Scottish University of the Year 2011-2013
> Top in the UK for student experience
> Fourth university in the UK and top in Scotland (National Student Survey 2012)
> We invite research leaders and ambitious early career researchers to
> join us in leading and driving research in key inter-disciplinary themes.
> Please see for further information and how
> to apply.
> Heriot-Watt University is a Scottish charity
> registered under charity number SC000278.
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index