Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Bootstrapping & clustered standard errors (-xtreg-)

From   Stas Kolenikov <>
Subject   Re: st: Bootstrapping & clustered standard errors (-xtreg-)
Date   Thu, 8 Sep 2011 15:28:05 -0500


I would say that you are worried about exactly the wrong things. The
sampling weights control mostly for unequal probabilities of
selection, and for well-designed and well-conducted surveys,
non-response adjustments are not that large, while probabilities of
selection might differ quite notably. While it is true that if you can
fully condition on the design variables and non-response propensity,
you can ignore the weights, I am yet to see an example where that
would happen. Believing that your model is perfect is... uhm... naive,
let's put it mildly; if anything, econometrics moves away from making
such strong assumptions as "my model is absolutely right" towards
robust methods of inference that would allow for some minor deviations
from the "absolutely right" scenario. There are no assumptions of
normality made anywhere in the process of calculating the standard
errors. All arguments are asymptotic, and you see z- rather than
t-statistics in the output. In fact, the arguments justifying the
bootstrap are asymptotic, as well. You can still entertain the
bootstrap idea, but basically the only way to check that you've done
it right is to compare the bootstrap standard errors with the
clustered standard errors. If they are about the same, any of them is
usable; if they are wildly different (say by more than 50%), I would
not either of them, but I would first check to see that the bootstrap
was done right.

I know that PNAS is a huge impact factor journal in natural sciences,
but a statistics journal? or an econometrics journal? I mean, it's
cool to have a paper there on your resume, but I doubt many statalist
subscribers look at this journal for methodological insights (some
data miners or bioinformaticians or other statisticians on the margin
of computer science do publish in PNAS, though). I would not turn to
an essentially applied psychology paper for advice on clustered
standard errors.

The error that you report probably comes from the bootstrap producing
a sample with fewer cluster identifiers than regressors in your model.
Normally, this would be rectified by specifying -idcluster()- option;
however in some odd cases, the bootstrap samples may still be
underidentified. I don't know whether the fixed effects regression
should be prone to such empirical underidentification. It might be,
given that not all of the parameters of an arbitrary model are
identified (the slopes of the time-invariant variables aren't).

On Thu, Sep 8, 2011 at 3:30 AM, Tobias Pfaff
<> wrote:
> Dear Stas, Cam,
> Thanks for your input!
> I want to bootstrap as a robustness check since my residuals of the FE
> regression are not normally distributed.
> And bootstrapping as a robustness check because it does not assume normality
> of the residuals
> (e.g., Headey et al. 2010, appendix p. 3,
> If I do bootstrapping with clustered standard errors as Jeff has explained I
> get the following error message:
> - insufficient observations
> an error occurred when bootstrap executed xtreg, posting missing values -
> Cam, you say that I would need custom bootstrap weights. My dataset provides
> individual weights with adjustments
> for non-response etc. I do not use weights for the regression because the
> possible selection bias is mitigated due
> to the fact that the variables which could cause the bias are included as
> control variables (e.g., income, employment
> status). Thus, I would argue that my model is complete and the unweighted
> analysis leads to unbiased estimators.
> 1. Would you still include weights for the bootstrapping?
> 2. Does bootstrapping need more degrees of freedom than the normal
> estimation of -xtreg- so that I get the above error message?
> 3. If bootstrapping is not a good idea in this case, what can I do to
> encounter the breach of the normality assumption of the residuals?
> (I already checked transformation of the variables, but that doesn't help)
> Regards,
> Tobias
> -----Ursprüngliche Nachricht-----
>> Date: Wed, 7 Sep 2011 10:24:33 -0400
>> Subject: RE: st: Bootstrapping & clustered standard errors (-xtreg-)
>> From: Cameron McIntosh <>
>> To:
> Stas, Tobias
> I agree with Stas that there is not much point in using the bootstrap in
> this case, unless you have custom bootstrap weights computed by a
> statistical agency for a complex sampling frame, which would incorporate
> adjustments for non-response and calibration to known totals, etc. I don't
> think that is the case here, so I would go with the -cluster- SEs too.
> My two cents,
> Cam
>> Date: Wed, 7 Sep 2011 09:03:27 -0500
>> Subject: Re: st: Bootstrapping & clustered standard errors (-xtreg-)
>> From:
>> To:
>> Tobias,
>> can you please explain why you need the bootstrap at all? The
>> bootstrap standard errors are equivalent to the regular -cluster-
>> standard errors asymptotically (in this case, with the number of
>> clusters going off to infinity), and, if anything, it is easier to get
>> the bootstrap wrong than right with difficult problems. If -cluster-
>> option works at all with -xtreg-, I see little reason to use the
>> bootstrap. (Very technically speaking, in my simulations, I've seen
>> the bootstrap standard errors to be more stable than -robust- standard
>> errors with large number of the bootstrap repetitions that have to be
>> in an appropriate relations with the sample size; whether that carries
>> over to the cluster standard errors, I don't know.)
>> On Tue, Sep 6, 2011 at 12:25 PM, Tobias Pfaff
>> <> wrote:
>> > Dear Statalisters,
>> >
>> > I do the following fixed effects regression:
>> >
>> > xtreg depvar indepvars, fe vce(cluster region) nonest dfadj
>> >
>> > Individuals in the panel are identified by the variable "pid". The
>> > time variable is "svyyear". Data were previously declared as panel
>> > data with -xtset pid svyyear-.
>> > Since one of my independent variables is clustered at the regional
>> > level (not at the individual level), I use the option -vce(cluster
> region)-.
>> >
>> > Now, I would like to do the same thing with bootstrapped standard
> errors.
>> > I tried several commands, however, none of them works so far. For
> example:
>> >
>> > xtreg depvar indepvars, fe vce(bootstrap, reps(3) seed(1)
> cluster(region))
>> > nonest dfadj
>> > .where I get the error message "option cluster() not allowed".
>> >
>> > None of the hints in the manual (e.g., -idcluster()-, -xtset,
>> > clear-,
> -i()-
>> > in the main command) were helpful so far.
>> >
>> > How can I tell the bootstrapping command that the standard errors
>> > should
> be
>> > clustered at the regional level while using "pid" for panel individuals?
>> >
>> > Any comments are appreciated!

Stas Kolenikov, also found at
Small print: I use this email account for mailing lists only.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index