Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Schaffer, Mark E" <M.E.Schaffer@hw.ac.uk> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: First stage F stats - xtivreg |
Date | Tue, 21 Jun 2011 21:33:44 +0100 |
Austin, I think I see what is going on. The root of the problem is the fixed effects incidental parameters problem. If you use the classical VCE estimator, it needs a dof adjustment to account for the FEs. If FEs are encompassed by clusters, then no dof adjustment is needed. But if the panel units/FEs overlap over clusters, then some dof adjustment would be needed. The problem is that, last I heard, no one had worked out what it should be: http://www.stata.com/statalist/archive/2009-03/msg00353.html The two-way cluster-robust VCE implemented by -xtivreg2- allows the cluster option if at least one of the two clusters encompasses the FEs. This was just a pragmatic decision on my part when implementing it in -xtivreg2-; I don't know of a literature reference. --Mark > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of > Austin Nichols > Sent: 21 June 2011 20:58 > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: First stage F stats - xtivreg > > Mark-- > Try this example: > > webuse nhanes2, clear > xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(houssiz) > xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(houssiz location) > xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(location) > *above gives error "cluster option not supported if a panel spans more > than one cluster" > xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(location sampl) > > The problem is that the OP wants fixed effects for persons (houssiz > above) but should cluster by region (location above) which is not > constant within person. I suggested clustering by region (not > feasible with regions vary over time within person), by person > (clearly feasible), and by person and region as two dimensions of > clustering (not clear why this would be feasible when clustering by > region is not allowed). > > . which xtivreg2 > d:\ado\plus\x\xtivreg2.ado > *! xtivreg2 1.0.12 17June2010 > *! author mes > > . which ivreg2 > d:\ado\plus\i\ivreg2.ado > *! ivreg2 3.0.06 30Jan2011 > *! authors cfb & mes > *! see end of file for version comments > > > On Tue, Jun 21, 2011 at 2:49 PM, Schaffer, Mark E > <M.E.Schaffer@hw.ac.uk> wrote: > > Agnese, Austin, > > > > Am I missing something here? Using abdata.dta, > > > > webuse abdata > > > > xtivreg2 n w k, fe cluster(id year) > > > > seems to work fine. The panel identifier is id, and of > course overlaps over different years, and year is one of the > cluster variables. > > > > Do you have the lastest -xtivreg2-? It should be > > > > . which xtivreg2, all > > > > c:\ado\personal\xtivreg2.ado > > *! xtivreg2 1.0.12 17June2010 > > *! author mes > > > > --Mark > > > >> -----Original Message----- > >> From: owner-statalist@hsphsun2.harvard.edu > >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of > >> Austin Nichols > >> Sent: 21 June 2011 18:42 > >> To: statalist@hsphsun2.harvard.edu > >> Subject: Re: st: First stage F stats - xtivreg > >> > >> Agnese Romiti <romitiagnese@gmail.com>: > >> You are right about -xtivreg2- refusing to participate, so > >> you could simply include dummies for every fixed effect in > >> -ivreg2-, e.g. > >> > >> webuse nhanes2, clear > >> xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(houssiz) > >> xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(location > >> sampl) qui ta houssiz, gen(d_) > >> ivreg2 hlthstat (iron=lead) d_*, fwl(d_*) cluster(location sampl) > >> > >> Or you could cluster by initial region instead, e.g. > >> > >> bys i (t): g initregion=region[1] > >> > >> which involves different assumptions, but will also give you > >> evidence of how the data seem to be clustered. > >> > >> On Tue, Jun 21, 2011 at 1:06 PM, Agnese Romiti > >> <romitiagnese@gmail.com> wrote: > >> > Dear Austin, > >> > > >> > When I used as cluster unit region-year or also only region > >> I had to > >> > run ivreg2 on the data that I have previously transformed > >> in deviation > >> > to the mean (within trasformation) because the xtivreg2 > >> requires that > >> > no panel overlaps more than one cluster. So panels should > >> be uniquely > >> > assigned to clusters. > >> > I tried to run instead xtivreg2 with two clusters as > you suggested > >> > but I received an error message "cluster(): too many variables > >> > specified", apparently because I don't have the latest > >> version of the > >> > commands. I have just done an update all and my stata seems to be > >> > updated to 30March 2011 (exe and ado), and to 1Sept 2010 , the > >> > utilities. Is there a reason whereby I still get the error? > >> > > >> > Thanks > >> > Agnese > >> > > >> > > >> > > >> > > >> > 2011/6/21 Austin Nichols <austinnichols@gmail.com>: > >> >> Agnese Romiti <romitiagnese@gmail.com>: > >> >> I don't see how it matters that individuals move across > clusters, > >> >> unless you want to cluster by individual as well, and -xtivreg2- > >> >> allows two dimensions of clustering. When you cluster by > >> region-year, > >> >> you assume that a draw from the dgp of person i in year t is > >> >> independent from a draw from the dgp of person i in year > >> t+1, which > >> >> is clearly problematic. You should try clustering by > >> individual, by > >> >> region, and then try two dimensions of clustering. Let us > >> know how > >> >> the first stage diagnostic statistics and SEs on main > variables of > >> >> interest, in each of those 3 cases, compare to your > >> >> region-year-clustered version. > >> >> > >> >> On Tue, Jun 21, 2011 at 10:47 AM, Agnese Romiti > >> <romitiagnese@gmail.com> wrote: > >> >>> Austin, > >> >>> > >> >>> The reason whereby I have chosen the region-year as > >> cluster unit was > >> >>> due to the fact that individuals - around 8 percent of > >> them - move > >> >>> across regions over time, so the region was not > unique for them. > >> >>> > >> >>> Many thanks again for your help and the ref. > >> >>> Agnese > >> >>> > >> >>> 2011/6/21 Austin Nichols <austinnichols@gmail.com>: > >> >>>> Agnese Romiti <romitiagnese@gmail.com> In that case the > >> >>>> cluster-robust SE will be biased downward slightly, > resulting in > >> >>>> overrejection and your first-stage F stat overstated, > >> but I expect > >> >>>> it will still outperform the SE and F clustering by > >> region-year. > >> >>>> You would have to do simulations matching your exact > setup to be > >> >>>> sure; see e.g. > >> >>>> http://www.stata.com/meeting/13uk/nichols_crse.pdf > >> >>>> > >> >>>> On Tue, Jun 21, 2011 at 3:27 AM, Agnese Romiti > >> <romitiagnese@gmail.com> wrote: > >> >>>>> Hi, > >> >>>>> Thanks again > >> >>>>> In my data I have 19 regions, and around 18 percent of > >> the data in > >> >>>>> the largest region. > >> >>>>> > >> >>>>> Agnese > >> >>>>> > >> >>>>> > >> >>>>> 2011/6/21 Austin Nichols <austinnichols@gmail.com>: > >> >>>>>> Agnese Romiti <romitiagnese@gmail.com>: > >> >>>>>> No, you should cluster by region to correctly account for > >> >>>>>> possible serial correlation, assuming you have > >> sufficiently many > >> >>>>>> regions in your data; how many are there? > >> >>>>>> What percent of the data is in the largest region? > >> >>>>>> > >> >>>>>> On Mon, Jun 20, 2011 at 5:19 PM, Agnese Romiti > >> <romitiagnese@gmail.com> wrote: > >> >>>>>>> Many thanks Austin, > >> >>>>>>> > >> >>>>>>> I'm actually clustering the standard errors at > >> region-year level > >> >>>>>>> rather than at region because I have one regressor with > >> >>>>>>> variability at region-year level. Is that correct? > >> >>>>>>> Do you think that the high first stage F stats might > >> be a signal > >> >>>>>>> of a bad instrument?Like a failure of the exogeneity > >> requirement? > >> >>>>>>> > >> >>>>>>> Agnese > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> 2011/6/20 Austin Nichols <austinnichols@gmail.com>: > >> >>>>>>>> Agnese Romiti <romitiagnese@gmail.com>: > >> >>>>>>>> Are you clustering by region to account for the likely > >> >>>>>>>> correlation of errors within region? > >> >>>>>>>> Also see > >> >>>>>>>> http://www.stata.com/meeting/boston10/boston10_nichols.pdf > >> >>>>>>>> for an alternative model that allows your dep var to > >> be nonnegative. > >> >>>>>>>> > >> >>>>>>>> On Mon, Jun 20, 2011 at 3:49 AM, Agnese Romiti > >> <romitiagnese@gmail.com> wrote: > >> >>>>>>>>> Dear Statalist users, > >> >>>>>>>>> > >> >>>>>>>>> I'm running a fixed effect model with IV (xtivreg2) , my > >> >>>>>>>>> dependent variable is a measure of labor supply at the > >> >>>>>>>>> individual level (working hours). Whereas I have an > >> endogenous > >> >>>>>>>>> variable with variation only at regional-year level. > >> >>>>>>>>> My question is about the First stage statistics, the Weak > >> >>>>>>>>> identification test results in an F statistics > >> extremely high > >> >>>>>>>>> which makes me worry about something wrong, i.e. F=3289. > >> >>>>>>>>> Do you have any clue about potential reasons > >> driving this odd result? > >> >>>>>>>>> > >> >>>>>>>>> Many thanks in advance for your help. > >> >>>>>>>>> > >> >>>>>>>>> Agnese > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Heriot-Watt University is a Scottish charity registered under charity number SC000278. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/