Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: bootstrap reject()

From   Nick Cox <>
To   "''" <>
Subject   st: RE: RE: bootstrap reject()
Date   Tue, 1 Nov 2011 13:46:03 +0000

However, if this were being talked about in a paper I was reviewing, I would recommend a two-stage approach:

1. Save the bootstrap results to a new dataset. 

2. Analyse the bootstrap results with and without the restriction of interest. 

That way, anyone can see what difference it makes. If you just throw away results you don't want, and throw away the bin too, people can't even rummage in the bin to see what it was that you discarded. 

In work I see, there is overwhelming emphasis on just using bootstrapping to get data-based confidence intervals and too little use made of the scope for bootstrapping to tell you something about the entire sampling distribution. Add all the appropriate caveats you like. 


-----Original Message-----
From: [] On Behalf Of Nick Cox
Sent: 01 November 2011 13:33
To: ''
Subject: st: RE: bootstrap reject()

Looking at the help for -bootstrap- I see an option -reject()- which appears to do exactly what you ask. 

I don't think we can comment on correctness. If this makes sense in context, it makes sense in context. What people you report to might say is difficult to second guess, unless this is a matter of following their instructions. 


Shehzad Ali

A colleague asked this question. Is there a way to specify a range (or rejection region) for the parameter of interest when using the -bootstrap- program in Stata? She is using GLM for her cost data, followed by predicting costs for the treatment and control arms, then taking the difference between mean costs in treatment and control arms. The standard error of the difference in mean costs is then estimated by bootstrapping the process. In order to execute this, she is using the following general approach:

capture program drop bootstrap1

glm model followed by predicting costs for treatment=1 and treatment=0. Then the difference between mean predicted costs for the two groups is saved as a scalar 'costdiff'.

Subsequently, Stata's -bootstrap- is used to repeat this process a 1,000 times.

bootstrap costdiff=r(costdiff), reps(1000) saving(bootrep1, replace) seed(1234): 

However, some of the predicted cost differences are very huge (>5% of the bootstrap samples) and the SE is also huge (several times the SE in the observed sample). Is there a way, the predicted costs could be restricted within a range (for instance, the cost difference should be within a range that is close to the observed sample) or would this approach be incorrect?

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index