Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: First stage F stats - xtivreg


From   Austin Nichols <[email protected]>
To   [email protected]
Subject   Re: st: First stage F stats - xtivreg
Date   Tue, 21 Jun 2011 15:57:39 -0400

Mark--
Try this example:

webuse nhanes2, clear
xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(houssiz)
xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(houssiz location)
xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(location)
*above gives error "cluster option not supported if a panel spans more
than one cluster"
xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(location sampl)

The problem is that the OP wants fixed effects for persons (houssiz
above) but should cluster by region (location above) which is not
constant within person.  I suggested clustering by region (not
feasible with regions vary over time within person), by person
(clearly feasible), and by person and region as two dimensions of
clustering (not clear why this would be feasible when clustering by
region is not allowed).

. which xtivreg2
d:\ado\plus\x\xtivreg2.ado
*! xtivreg2 1.0.12 17June2010
*! author mes

. which ivreg2
d:\ado\plus\i\ivreg2.ado
*! ivreg2 3.0.06  30Jan2011
*! authors cfb & mes
*! see end of file for version comments


On Tue, Jun 21, 2011 at 2:49 PM, Schaffer, Mark E <[email protected]> wrote:
> Agnese, Austin,
>
> Am I missing something here?  Using abdata.dta,
>
> webuse abdata
>
> xtivreg2 n w k, fe cluster(id year)
>
> seems to work fine.  The panel identifier is id, and of course overlaps over different years, and year is one of the cluster variables.
>
> Do you have the lastest -xtivreg2-?  It should be
>
> . which xtivreg2, all
>
> c:\ado\personal\xtivreg2.ado
> *! xtivreg2 1.0.12 17June2010
> *! author mes
>
> --Mark
>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of
>> Austin Nichols
>> Sent: 21 June 2011 18:42
>> To: [email protected]
>> Subject: Re: st: First stage F stats - xtivreg
>>
>> Agnese Romiti <[email protected]>:
>> You are right about -xtivreg2- refusing to participate, so
>> you could simply include dummies for every fixed effect in
>> -ivreg2-, e.g.
>>
>> webuse nhanes2, clear
>> xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(houssiz)
>> xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(location
>> sampl) qui ta houssiz, gen(d_)
>> ivreg2 hlthstat (iron=lead) d_*, fwl(d_*) cluster(location sampl)
>>
>> Or you could cluster by initial region instead, e.g.
>>
>> bys i (t): g initregion=region[1]
>>
>> which involves different assumptions, but will also give you
>> evidence of how the data seem to be clustered.
>>
>> On Tue, Jun 21, 2011 at 1:06 PM, Agnese Romiti
>> <[email protected]> wrote:
>> > Dear Austin,
>> >
>> > When I used as cluster unit region-year or also only region
>> I had to
>> > run ivreg2 on the data that I have previously transformed
>> in deviation
>> > to the mean (within trasformation) because the xtivreg2
>> requires that
>> > no panel overlaps more than one cluster. So panels should
>> be uniquely
>> > assigned to clusters.
>> >  I tried to run instead xtivreg2 with two clusters as you suggested
>> > but I received an error message  "cluster():  too many variables
>> > specified", apparently  because I don't have the latest
>> version of the
>> > commands. I have just done an update all and my stata seems to be
>> > updated to 30March 2011 (exe and ado), and to 1Sept 2010 , the
>> > utilities. Is there a reason whereby  I still get the error?
>> >
>> > Thanks
>> > Agnese
>> >
>> >
>> >
>> >
>> > 2011/6/21 Austin Nichols <[email protected]>:
>> >> Agnese Romiti <[email protected]>:
>> >> I don't see how it matters that individuals move across clusters,
>> >> unless you want to cluster by individual as well, and -xtivreg2-
>> >> allows two dimensions of clustering. When you cluster by
>> region-year,
>> >> you assume that a draw from the dgp of person i in year t is
>> >> independent from a draw from the dgp of person i in year
>> t+1, which
>> >> is clearly problematic.  You should try clustering by
>> individual, by
>> >> region, and then try two dimensions of clustering.  Let us
>> know how
>> >> the first stage diagnostic statistics and SEs on main variables of
>> >> interest, in each of those 3 cases, compare to your
>> >> region-year-clustered version.
>> >>
>> >> On Tue, Jun 21, 2011 at 10:47 AM, Agnese Romiti
>> <[email protected]> wrote:
>> >>> Austin,
>> >>>
>> >>> The reason whereby I have chosen the region-year as
>> cluster unit was
>> >>> due to the fact that individuals - around 8 percent of
>> them - move
>> >>> across regions over time, so the region  was not unique for them.
>> >>>
>> >>> Many thanks again for your help and the ref.
>> >>> Agnese
>> >>>
>> >>> 2011/6/21 Austin Nichols <[email protected]>:
>> >>>> Agnese Romiti <[email protected]> In that case the
>> >>>> cluster-robust SE will be biased downward slightly, resulting in
>> >>>> overrejection and your first-stage F stat overstated,
>> but I expect
>> >>>> it will still outperform the SE and F clustering by
>> region-year.
>> >>>> You would have to do simulations matching your exact setup to be
>> >>>> sure; see e.g.
>> >>>> http://www.stata.com/meeting/13uk/nichols_crse.pdf
>> >>>>
>> >>>> On Tue, Jun 21, 2011 at 3:27 AM, Agnese Romiti
>> <[email protected]> wrote:
>> >>>>> Hi,
>> >>>>> Thanks again
>> >>>>> In my data I have 19 regions, and around 18 percent of
>> the data in
>> >>>>> the largest region.
>> >>>>>
>> >>>>> Agnese
>> >>>>>
>> >>>>>
>> >>>>> 2011/6/21 Austin Nichols <[email protected]>:
>> >>>>>> Agnese Romiti <[email protected]>:
>> >>>>>> No, you should cluster by region to correctly account for
>> >>>>>> possible serial correlation, assuming you have
>> sufficiently many
>> >>>>>> regions in your data; how many are there?
>> >>>>>> What percent of the data is in the largest region?
>> >>>>>>
>> >>>>>> On Mon, Jun 20, 2011 at 5:19 PM, Agnese Romiti
>> <[email protected]> wrote:
>> >>>>>>> Many thanks Austin,
>> >>>>>>>
>> >>>>>>> I'm actually clustering the standard errors at
>> region-year level
>> >>>>>>> rather than at region because I have one regressor with
>> >>>>>>> variability at region-year level. Is that correct?
>> >>>>>>> Do you think that the high first stage F stats might
>> be a signal
>> >>>>>>> of a bad instrument?Like a failure of the exogeneity
>> requirement?
>> >>>>>>>
>> >>>>>>> Agnese
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> 2011/6/20 Austin Nichols <[email protected]>:
>> >>>>>>>> Agnese Romiti <[email protected]>:
>> >>>>>>>> Are you clustering by region to account for the likely
>> >>>>>>>> correlation of errors within region?
>> >>>>>>>> Also see
>> >>>>>>>> http://www.stata.com/meeting/boston10/boston10_nichols.pdf
>> >>>>>>>> for an alternative model that allows your dep var to
>> be nonnegative.
>> >>>>>>>>
>> >>>>>>>> On Mon, Jun 20, 2011 at 3:49 AM, Agnese Romiti
>> <[email protected]> wrote:
>> >>>>>>>>> Dear Statalist users,
>> >>>>>>>>>
>> >>>>>>>>> I'm running a fixed effect model with IV (xtivreg2) , my
>> >>>>>>>>> dependent variable is a measure of labor supply at the
>> >>>>>>>>> individual level (working hours). Whereas I have an
>> endogenous
>> >>>>>>>>> variable with variation only at regional-year level.
>> >>>>>>>>> My question is about the First stage statistics, the Weak
>> >>>>>>>>> identification test results in an F statistics
>> extremely high
>> >>>>>>>>> which makes me worry about something wrong, i.e. F=3289.
>> >>>>>>>>> Do you have any clue about potential reasons
>> driving this odd result?
>> >>>>>>>>>
>> >>>>>>>>> Many thanks in advance for your help.
>> >>>>>>>>>
>> >>>>>>>>> Agnese

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index