# Re: st: bootstrapping and time series

 From Stas Kolenikov <[email protected]> To [email protected] Subject Re: st: bootstrapping and time series Date Fri, 8 Oct 2004 15:05:34 -0400

```On Fri, 8 Oct 2004 18:50:31 +0100, Nick Cox <[email protected]> wrote:
> I'm sure Stas (and Jeff Pitblado) are right here.
>
> However, do note that there is literature on special
> bootstrapping methods for time series (see e.g. Politis _Statistical
> Science_ 2003). The point is that Stata's
> -bootstrap- implements none of these methods.
>
> How far such methods extend to panel data I
> do not know.

Well if you can assume that your panels (i.e., individuals observed
over time) are independent, then that is an appropriate unit to
resample, and that can be handled by Stata's -bsample-. If -id- is the
panel ID, and -year- is your time variable, so that your data set is

tsset id year

then you can resample your data by

bsample ... , ...cluster(id) newcluster(newid)

and then setting it up

tsset newid year

within your estimation routine, i.e. before -ivreg- or whatever you
want to do with it. Note that this would necessarily involve
programming so that your program has at least two lines:

tsset newid year
ivreg2 ....

Then, with estimation results still in memory, you can still use
_b[whatever] in the -bs-'s -exp_list-.

Additional statistical inconvenience arise, however, It is known that
the bootstrap distribution converges faster for the pivotal statistics
(i.e. those that converge to a fully known distribution). The
distinction here is between saving the coefficients _b[whatever] and
the t-statistics _b[whatever]/_se[whatever]. The latter will converge
to N(0,1), while the former, to N(true beta, sampling variance of
beta-hat).

Wait a second. The first one will converge to that nice N(0,1) only if
there is no effect of -whatever- variable. You need to resample under
the null hypothesis, so we need to sample from the distribution that
has all our explanatory variables as they are, and the dependent
variable equal to 0+0*x1+0*x2+error, where error follows exactly the
same distribution it has in our data, which is not observed... and so
on, and so on.

To sum up again: doing the bootstrap properly involves quite a bit
more assumptions than one usually seems to think it does, and I have
not yet gotten into the discussion of whehter the bootstrap will give
you a reasonable estimate of the variance / distribution (which
involves yet another layer of highly technical asymptotic results that
still have some regularity and mixing assumptions).

My personal take on the bootstrap is to use it ONLY when (i) you know
the standard errors provided by Stata are totally wrong, and there is
no way to correct them analytically (and for two-stage econometric
applications that people often complain they cannot get standard
errors for, Murphy-Topel standard errors should work; I hope Mark
Schaffer can correct me if that's not so); (ii) you totally know what
you are doing with the bootstrap and what assumptions are implied by
the bootstrap procedure. (Think of installing yet another software and
clicking "I agree" button without reading the small print. In the
second screen that you have not read, they have: "You are entitled to
use this software for 30 days for evaluation purposes. If you have not
purchased the full license by the 31st day, your hard drive will be
formatted").

Please don't get me wrong, the bootstrap is a very powerful technique,
but as all powerful techniques, you need to know how to use it. I know
enough to warn against using it when I see the reasons for it to break
down, like the dependent / heterogeneous data.

--
Stas Kolenikov
http://stas.kolenikov.name
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```