Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: First stage F stats - xtivreg


From   "Schaffer, Mark E" <M.E.Schaffer@hw.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: First stage F stats - xtivreg
Date   Tue, 21 Jun 2011 21:33:44 +0100

Austin,

I think I see what is going on.

The root of the problem is the fixed effects incidental parameters problem.  If you use the classical VCE estimator, it needs a dof adjustment to account for the FEs.  If FEs are encompassed by clusters, then no dof adjustment is needed.  But if the panel units/FEs overlap over clusters, then some dof adjustment would be needed.  The problem is that, last I heard, no one had worked out what it should be:

http://www.stata.com/statalist/archive/2009-03/msg00353.html

The two-way cluster-robust VCE implemented by -xtivreg2- allows the cluster option if at least one of the two clusters encompasses the FEs.  This was just a pragmatic decision on my part when implementing it in -xtivreg2-; I don't know of a literature reference.

--Mark

> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu 
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of 
> Austin Nichols
> Sent: 21 June 2011 20:58
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: First stage F stats - xtivreg
> 
> Mark--
> Try this example:
> 
> webuse nhanes2, clear
> xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(houssiz)
> xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(houssiz location)
> xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(location)
> *above gives error "cluster option not supported if a panel spans more
> than one cluster"
> xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(location sampl)
> 
> The problem is that the OP wants fixed effects for persons (houssiz
> above) but should cluster by region (location above) which is not
> constant within person.  I suggested clustering by region (not
> feasible with regions vary over time within person), by person
> (clearly feasible), and by person and region as two dimensions of
> clustering (not clear why this would be feasible when clustering by
> region is not allowed).
> 
> . which xtivreg2
> d:\ado\plus\x\xtivreg2.ado
> *! xtivreg2 1.0.12 17June2010
> *! author mes
> 
> . which ivreg2
> d:\ado\plus\i\ivreg2.ado
> *! ivreg2 3.0.06  30Jan2011
> *! authors cfb & mes
> *! see end of file for version comments
> 
> 
> On Tue, Jun 21, 2011 at 2:49 PM, Schaffer, Mark E 
> <M.E.Schaffer@hw.ac.uk> wrote:
> > Agnese, Austin,
> >
> > Am I missing something here?  Using abdata.dta,
> >
> > webuse abdata
> >
> > xtivreg2 n w k, fe cluster(id year)
> >
> > seems to work fine.  The panel identifier is id, and of 
> course overlaps over different years, and year is one of the 
> cluster variables.
> >
> > Do you have the lastest -xtivreg2-?  It should be
> >
> > . which xtivreg2, all
> >
> > c:\ado\personal\xtivreg2.ado
> > *! xtivreg2 1.0.12 17June2010
> > *! author mes
> >
> > --Mark
> >
> >> -----Original Message-----
> >> From: owner-statalist@hsphsun2.harvard.edu
> >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of
> >> Austin Nichols
> >> Sent: 21 June 2011 18:42
> >> To: statalist@hsphsun2.harvard.edu
> >> Subject: Re: st: First stage F stats - xtivreg
> >>
> >> Agnese Romiti <romitiagnese@gmail.com>:
> >> You are right about -xtivreg2- refusing to participate, so
> >> you could simply include dummies for every fixed effect in
> >> -ivreg2-, e.g.
> >>
> >> webuse nhanes2, clear
> >> xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(houssiz)
> >> xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(location
> >> sampl) qui ta houssiz, gen(d_)
> >> ivreg2 hlthstat (iron=lead) d_*, fwl(d_*) cluster(location sampl)
> >>
> >> Or you could cluster by initial region instead, e.g.
> >>
> >> bys i (t): g initregion=region[1]
> >>
> >> which involves different assumptions, but will also give you
> >> evidence of how the data seem to be clustered.
> >>
> >> On Tue, Jun 21, 2011 at 1:06 PM, Agnese Romiti
> >> <romitiagnese@gmail.com> wrote:
> >> > Dear Austin,
> >> >
> >> > When I used as cluster unit region-year or also only region
> >> I had to
> >> > run ivreg2 on the data that I have previously transformed
> >> in deviation
> >> > to the mean (within trasformation) because the xtivreg2
> >> requires that
> >> > no panel overlaps more than one cluster. So panels should
> >> be uniquely
> >> > assigned to clusters.
> >> >  I tried to run instead xtivreg2 with two clusters as 
> you suggested
> >> > but I received an error message  "cluster():  too many variables
> >> > specified", apparently  because I don't have the latest
> >> version of the
> >> > commands. I have just done an update all and my stata seems to be
> >> > updated to 30March 2011 (exe and ado), and to 1Sept 2010 , the
> >> > utilities. Is there a reason whereby  I still get the error?
> >> >
> >> > Thanks
> >> > Agnese
> >> >
> >> >
> >> >
> >> >
> >> > 2011/6/21 Austin Nichols <austinnichols@gmail.com>:
> >> >> Agnese Romiti <romitiagnese@gmail.com>:
> >> >> I don't see how it matters that individuals move across 
> clusters,
> >> >> unless you want to cluster by individual as well, and -xtivreg2-
> >> >> allows two dimensions of clustering. When you cluster by
> >> region-year,
> >> >> you assume that a draw from the dgp of person i in year t is
> >> >> independent from a draw from the dgp of person i in year
> >> t+1, which
> >> >> is clearly problematic.  You should try clustering by
> >> individual, by
> >> >> region, and then try two dimensions of clustering.  Let us
> >> know how
> >> >> the first stage diagnostic statistics and SEs on main 
> variables of
> >> >> interest, in each of those 3 cases, compare to your
> >> >> region-year-clustered version.
> >> >>
> >> >> On Tue, Jun 21, 2011 at 10:47 AM, Agnese Romiti
> >> <romitiagnese@gmail.com> wrote:
> >> >>> Austin,
> >> >>>
> >> >>> The reason whereby I have chosen the region-year as
> >> cluster unit was
> >> >>> due to the fact that individuals - around 8 percent of
> >> them - move
> >> >>> across regions over time, so the region  was not 
> unique for them.
> >> >>>
> >> >>> Many thanks again for your help and the ref.
> >> >>> Agnese
> >> >>>
> >> >>> 2011/6/21 Austin Nichols <austinnichols@gmail.com>:
> >> >>>> Agnese Romiti <romitiagnese@gmail.com> In that case the
> >> >>>> cluster-robust SE will be biased downward slightly, 
> resulting in
> >> >>>> overrejection and your first-stage F stat overstated,
> >> but I expect
> >> >>>> it will still outperform the SE and F clustering by
> >> region-year.
> >> >>>> You would have to do simulations matching your exact 
> setup to be
> >> >>>> sure; see e.g.
> >> >>>> http://www.stata.com/meeting/13uk/nichols_crse.pdf
> >> >>>>
> >> >>>> On Tue, Jun 21, 2011 at 3:27 AM, Agnese Romiti
> >> <romitiagnese@gmail.com> wrote:
> >> >>>>> Hi,
> >> >>>>> Thanks again
> >> >>>>> In my data I have 19 regions, and around 18 percent of
> >> the data in
> >> >>>>> the largest region.
> >> >>>>>
> >> >>>>> Agnese
> >> >>>>>
> >> >>>>>
> >> >>>>> 2011/6/21 Austin Nichols <austinnichols@gmail.com>:
> >> >>>>>> Agnese Romiti <romitiagnese@gmail.com>:
> >> >>>>>> No, you should cluster by region to correctly account for
> >> >>>>>> possible serial correlation, assuming you have
> >> sufficiently many
> >> >>>>>> regions in your data; how many are there?
> >> >>>>>> What percent of the data is in the largest region?
> >> >>>>>>
> >> >>>>>> On Mon, Jun 20, 2011 at 5:19 PM, Agnese Romiti
> >> <romitiagnese@gmail.com> wrote:
> >> >>>>>>> Many thanks Austin,
> >> >>>>>>>
> >> >>>>>>> I'm actually clustering the standard errors at
> >> region-year level
> >> >>>>>>> rather than at region because I have one regressor with
> >> >>>>>>> variability at region-year level. Is that correct?
> >> >>>>>>> Do you think that the high first stage F stats might
> >> be a signal
> >> >>>>>>> of a bad instrument?Like a failure of the exogeneity
> >> requirement?
> >> >>>>>>>
> >> >>>>>>> Agnese
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> 2011/6/20 Austin Nichols <austinnichols@gmail.com>:
> >> >>>>>>>> Agnese Romiti <romitiagnese@gmail.com>:
> >> >>>>>>>> Are you clustering by region to account for the likely
> >> >>>>>>>> correlation of errors within region?
> >> >>>>>>>> Also see
> >> >>>>>>>> http://www.stata.com/meeting/boston10/boston10_nichols.pdf
> >> >>>>>>>> for an alternative model that allows your dep var to
> >> be nonnegative.
> >> >>>>>>>>
> >> >>>>>>>> On Mon, Jun 20, 2011 at 3:49 AM, Agnese Romiti
> >> <romitiagnese@gmail.com> wrote:
> >> >>>>>>>>> Dear Statalist users,
> >> >>>>>>>>>
> >> >>>>>>>>> I'm running a fixed effect model with IV (xtivreg2) , my
> >> >>>>>>>>> dependent variable is a measure of labor supply at the
> >> >>>>>>>>> individual level (working hours). Whereas I have an
> >> endogenous
> >> >>>>>>>>> variable with variation only at regional-year level.
> >> >>>>>>>>> My question is about the First stage statistics, the Weak
> >> >>>>>>>>> identification test results in an F statistics
> >> extremely high
> >> >>>>>>>>> which makes me worry about something wrong, i.e. F=3289.
> >> >>>>>>>>> Do you have any clue about potential reasons
> >> driving this odd result?
> >> >>>>>>>>>
> >> >>>>>>>>> Many thanks in advance for your help.
> >> >>>>>>>>>
> >> >>>>>>>>> Agnese
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


-- 
Heriot-Watt University is a Scottish charity
registered under charity number SC000278.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index