Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Subsample Bootstrap

From   Maarten buis <>
Subject   RE: st: Subsample Bootstrap
Date   Fri, 20 Jan 2006 18:07:18 +0000 (GMT)

Tim R. Sass wrote:
> The procedure works fine on a moderately sized sample, but each 
> replication of the bootstrap takes over three hours on my full
> sample (over 1 million obs.).  In order to speed up the process
> I would like to perform each repetition of the bootstrap on a 
> subsample of the data, say 100,000 observations.  This can of
> course be done by setting the size() option in bootstrap.  The
> FAQs warn against this, however, saying "the standard error
> estimates are dependent upon the number of observations in each
> replication.  

The standard error says something about the precision of your estimate, and the more information
you use (observation, or stronger model assumptions) the more precise your estimate. So your
estimate of the based on 100,000 observations should be less pricise (larger s.e.) than your
estimate based on 1,000,000 observations. But the difference likely not to be a factor ten but
less. So if you estimate a s.e. using 100,000 observation you will get larger s.e. than when you
use your entire dataset, regardless whether you use bootstrap or some analytical method. Question
is, do you care? If all your effects are already significant using 100,000 observation than you
probably don't.

However if you do, than one solution that could work is to devide your dataset (randomly) into ten
datasets of 100,000 each, estimate your model in each dataset, and combine the estimates using
meta analysis techniques. I have no experience using meta analysis in Stata, but your problem is
relative simple: no worries about publication bias or whether the different "studies" actually
measure the same thing. So that should not be that hard. You could probably do it by hand, but I
don't have my meta-analysis books here, so I can't give you the formulas.

> Alternatively, I thought about just taking the estimated
> coefficients from each repetition of the bootstrap and then
> forming an empirical distribution from these estimates to get
> the standard errors.  But I am not quite sure how to
> accomplish this in Stata.

No need to do that, this is what bootstrap already does for you.


Maarten L. Buis
Department of Social Research Methodology 
Vrije Universiteit Amsterdam 
Boelelaan 1081 
1081 HV Amsterdam 
The Netherlands

visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214 

+31 20 5986715

Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail
*   For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index