Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: First stage F stats - xtivreg

 From Agnese Romiti To statalist@hsphsun2.harvard.edu Subject Re: st: First stage F stats - xtivreg Date Wed, 22 Jun 2011 11:19:49 +0200

```Austin,

I'm afraid I can't just control for individual fixed effects (ivreg2)
because the dummies are too many and the system doesn't allow me even
to create them, also increasing the maximum var.

However - before showing the comparison between the three clusters-
there is something in the ivreg2 run on within transformed which
worries me because I would expect that the results obtained by using
ivreg2 on the transformed data (and excluding the constant) should be
exaclty equivalent to the one found by using xtivreg2 on the original
data. I've checked with a very simple specification, with no
clustering, but the coeffcients are different between the two
specifications. Is this expected?

I've summed the results up from using cluster by individuals (using
xtivreg2) , cluster by Region and individual (using xtivreg2). I've
run the cluster by Region and the cluster by Region*year  using ivreg2
on the (manually) within transformed data due to the above mentioned
problem with xtivreg2 and the cluster unit.

As for the comparison of First stage statistics and SE on the main
variable obtained from the 3 types of clustering:

-clustering by individuals gives and F (Kleibergen-Paap) of  849  and
the SE for the main variable (the one aggregated at region-year) is
0.09.
-clustering  by individuals and Region gives and F (Kleibergen-Paap)
of  207.49 and the SE for the main variable (the one aggregated at
regione year) is around 0.15.
-clustering by Region*year  gives and F (Kleibergen-Paap) of 576  and
the SE for the main variable (the one aggregated at regione year) is
0.06.
-clustering by Regions gives and F (Kleibergen-Paap) of  308.19  and
the SE for the main variable (the one aggregated at region-year) is
0.13.

Apologise for the clutter.

Many thanks

2011/6/21 Austin Nichols <austinnichols@gmail.com>:
> Agnese Romiti <romitiagnese@gmail.com>:
> You are right about -xtivreg2- refusing to participate, so you could
> simply include dummies for every fixed effect in -ivreg2-, e.g.
>
> webuse nhanes2, clear
> xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(houssiz)
> xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(location sampl)
> qui ta houssiz, gen(d_)
> ivreg2 hlthstat (iron=lead) d_*, fwl(d_*) cluster(location sampl)
>
> Or you could cluster by initial region instead, e.g.
>
> bys i (t): g initregion=region[1]
>
> which involves different assumptions, but will also give you evidence
> of how the data seem to be clustered.
>
> On Tue, Jun 21, 2011 at 1:06 PM, Agnese Romiti <romitiagnese@gmail.com> wrote:
>> Dear Austin,
>>
>> When I used as cluster unit region-year or also only region I had to
>> run ivreg2 on the data that I have previously transformed in deviation
>> to the mean (within trasformation) because the xtivreg2 requires that
>> no panel overlaps more than one cluster. So panels should be uniquely
>> assigned to clusters.
>>  I tried to run instead xtivreg2 with two clusters as you suggested
>> but I received an error message  "cluster():  too many variables
>> specified", apparently  because I don't have the latest version of the
>> commands. I have just done an update all and my stata seems to be
>> updated to 30March 2011 (exe and ado), and to 1Sept 2010 , the
>> utilities. Is there a reason whereby  I still get the error?
>>
>> Thanks
>> Agnese
>>
>>
>>
>>
>> 2011/6/21 Austin Nichols <austinnichols@gmail.com>:
>>> Agnese Romiti <romitiagnese@gmail.com>:
>>> I don't see how it matters that individuals move across clusters,
>>> unless you want to cluster by individual as well, and -xtivreg2-
>>> allows two dimensions of clustering. When you cluster by region-year,
>>> you assume that a draw from the dgp of person i in year t is
>>> independent from a draw from the dgp of person i in year t+1, which is
>>> clearly problematic.  You should try clustering by individual, by
>>> region, and then try two dimensions of clustering.  Let us know how
>>> the first stage diagnostic statistics and SEs on main variables of
>>> interest, in each of those 3 cases, compare to your
>>> region-year-clustered version.
>>>
>>> On Tue, Jun 21, 2011 at 10:47 AM, Agnese Romiti <romitiagnese@gmail.com> wrote:
>>>> Austin,
>>>>
>>>> The reason whereby I have chosen the region-year as cluster unit was
>>>> due to the fact that individuals - around 8 percent of them - move
>>>> across regions over time, so the region  was not unique for them.
>>>>
>>>> Many thanks again for your help and the ref.
>>>> Agnese
>>>>
>>>> 2011/6/21 Austin Nichols <austinnichols@gmail.com>:
>>>>> Agnese Romiti <romitiagnese@gmail.com>
>>>>> In that case the cluster-robust SE will be biased downward slightly,
>>>>> resulting in overrejection and your first-stage F stat overstated, but
>>>>> I expect it will still outperform the SE and F clustering by
>>>>> region-year.  You would have to do simulations matching your exact
>>>>> setup to be sure; see e.g.
>>>>> http://www.stata.com/meeting/13uk/nichols_crse.pdf
>>>>>
>>>>> On Tue, Jun 21, 2011 at 3:27 AM, Agnese Romiti <romitiagnese@gmail.com> wrote:
>>>>>> Hi,
>>>>>> Thanks again
>>>>>> In my data I have 19 regions, and around 18 percent of the data in the
>>>>>> largest region.
>>>>>>
>>>>>> Agnese
>>>>>>
>>>>>>
>>>>>> 2011/6/21 Austin Nichols <austinnichols@gmail.com>:
>>>>>>> Agnese Romiti <romitiagnese@gmail.com>:
>>>>>>> No, you should cluster by region to correctly account for possible
>>>>>>> serial correlation,
>>>>>>> assuming you have sufficiently many regions in your data; how many are there?
>>>>>>> What percent of the data is in the largest region?
>>>>>>>
>>>>>>> On Mon, Jun 20, 2011 at 5:19 PM, Agnese Romiti <romitiagnese@gmail.com> wrote:
>>>>>>>> Many thanks Austin,
>>>>>>>>
>>>>>>>> I'm actually clustering the standard errors at region-year level
>>>>>>>> rather than at region because I have one regressor with variability at
>>>>>>>> region-year level. Is that correct?
>>>>>>>> Do you think that the high first stage F stats might be a signal of a
>>>>>>>> bad instrument?Like a failure of the exogeneity requirement?
>>>>>>>>
>>>>>>>> Agnese
>>>>>>>>
>>>>>>>>
>>>>>>>> 2011/6/20 Austin Nichols <austinnichols@gmail.com>:
>>>>>>>>> Agnese Romiti <romitiagnese@gmail.com>:
>>>>>>>>> Are you clustering by region to account for the likely correlation of
>>>>>>>>> errors within region?
>>>>>>>>> Also see
>>>>>>>>> http://www.stata.com/meeting/boston10/boston10_nichols.pdf
>>>>>>>>> for an alternative model that allows your dep var to be nonnegative.
>>>>>>>>>
>>>>>>>>> On Mon, Jun 20, 2011 at 3:49 AM, Agnese Romiti <romitiagnese@gmail.com> wrote:
>>>>>>>>>> Dear Statalist users,
>>>>>>>>>>
>>>>>>>>>> I'm running a fixed effect model with IV (xtivreg2) , my dependent
>>>>>>>>>> variable is a measure of labor supply at the individual level (working
>>>>>>>>>> hours). Whereas I have an endogenous variable with variation only at
>>>>>>>>>> regional-year level.
>>>>>>>>>> My question is about the First stage statistics, the Weak
>>>>>>>>>> identification test results in an F statistics extremely high which
>>>>>>>>>> makes me worry about something wrong, i.e. F=3289.
>>>>>>>>>> Do you have any clue about potential reasons driving this odd result?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Agnese
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```