Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: BCa bootstrap CIs: must I jackknife the entire sample?


From   Roger Harbord <rmharbord@googlemail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: BCa bootstrap CIs: must I jackknife the entire sample?
Date   Thu, 11 Feb 2010 16:41:53 +0000

As no-one has replied i'll attempt to wrap this up myself for the sake
of posterity, based on a bit of further reading and a little further
thought:

I now think there's no point calculating acceleration factors in a
sample this large and i'm safe to use bias-corrected (BC) confidence
intervals rather than bias-corrected and accelerated (BCa) CIs.

The acceleration factor 'a' is a second-order correction that goes as
1/sqrt(N), and with N=50000  the acceleration factor is of the order
of 1/sqrt(50000) = 1/224, which is too small to matter.

I was also confused about the role of skewness. The skewness of the
jackknife estimates is used in computing the acceleration factor 'a'.
But the fact that a bootstrap distribution is noticeably skew doesn't
mean a bias-corrected and accelerated (BCa) confidence interval would
differ noticeably from a bias-corrected (BC) interval, though it does
mean that BC, BCa and percentile-based intervals will all differ
noticeably from a normal-approximation interval, and the latter would
have poor coverage.

Roger.

On Wed, Feb 10, 2010 at 5:59 PM, Roger Harbord <rmharbord@googlemail.com> wrote:
> Dear Statalisters,
>
> I have a dataset with fifty thousand observations, and a non-standard
> estimation procedure for which i'd like to produce bias-corrected and
> accelerated (BCa) bootstrap confidence intervals. My problem is that
> the standard method of calculating the acceleration factor 'a'
> requires jackknifing the entire dataset, i.e. calculating the estimate
> leaving out each and every observation in turn, requiring fifty
> thousand runs of my estimation procedure. I don't want to wait that
> long and neither do my collaborators! To me it seems reasonable to
> instead calculate 'a' from a random sample of leave-one-out estimates
> - perhaps a thousand or more but far less than the whole fifty
> thousand. Can anyone see any problems with this?
>
> I can't believe i'm the first to come across this issue. Does anyone
> know of any literature discussing this? (Unfortunately the potentially
> relevant textbooks are on loan from our library at present, but i
> haven't found anything relevant from an hour or so's perusal of
> journal articles.) Is there any way of persuading the official
> -bootstrap- command to do this, or am i going to have to knit my own?
>
> And yes, i have examined the distribution of the bootstrap estimates
> and in a few cases they are noticeably skew, even with this large a
> sample, so i have reason for thinking BCa CIs could be a good idea.
>
> Roger.
> --
> Roger Harbord
> http://www.epi.bris.ac.uk/staff/rharbord.htm
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index