Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: bootsrap random number use


From   philippe van kerm <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: bootsrap random number use
Date   Mon, 7 Oct 2013 16:16:19 +0000

> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of Sergiy Radyakin
> Sent: Monday, October 07, 2013 4:02 PM
> To: [email protected]
> Subject: Re: st: bootsrap random number use
> 
> On Mon, Oct 7, 2013 at 6:48 AM, philippe van kerm
> <[email protected]> wrote:
> > It seems to me the -bsample- code is simply meant to avoid the
> explicit loop over observations (and so is fast even with many
> observations), but does not do extra magic otherwise. I would think the
> second uniform() ensures that the bootstrap draw does not depend on the
> initial sort order of the data.
> 
> Dear Philippe, thank you for this addition, but I still don't get it:
> how would the draws depend on the sort order of the data? 

Sergiy, 

I think it is because of the particular way -bsample- is coded. And in fact, my statement was inaccurate: it is a reproducibility issue.

Internally -bsample- does
	gen double `r' = int(uniform()*_N + 1)  
      gen double `w' = uniform()
      sort `r' `w'
The second variable (`w') ensures that the sort order is identical across repetitions for a given value of the seed (this would not be guaranteed otherwise). 

I convinced myself by repeating this code a few times:

 clear
 set seed 12345
 set obs 1000
 gen id = _n
 generate ui = floor((_N)*runiform() + 1) 
 generate w = runiform()
 sort ui 
 list id ui  in 1/10
 sort ui w
 list id ui  in 1/10
 
Despite the -set seed- statement, the sort order -sort ui- is not identical across replications, while it is always the same after -sort ui w-. 

This is crucial to ensure reproducibility of -bsample- results. 

> Is there
> such a problem with my model code? Note that I don't loop over
> observations, I loop over -draws-. Performance is not an issue here,
> but the amount of randomness is. Even if I can't recover the logic
> behind the bootstrap, can I be absolutely confident that it will
> require 2*N*k random numbers for k iterations? Or is it (N+1)*k?

I think the different number of random numbers required by -bsample- and your code just reflects coding differences. It is not inherent to the bootstrap. I would guess you could code it with N random numbers, if that matters. 

Philippe

> Thank you, Sergiy Radyakin
> 
> 
> >
> > Philippe
> >
> >> -----Original Message-----
> >> From: [email protected] [mailto:owner-
> >> [email protected]] On Behalf Of Sergiy Radyakin
> >> Sent: Saturday, October 05, 2013 1:15 AM
> >> To: [email protected]
> >> Subject: Re: st: bootsrap random number use
> >>
> >> On Fri, Oct 4, 2013 at 6:47 PM, Stas Kolenikov <[email protected]>
> >> wrote:
> >> > As far as I remember looking at the -bsample- code, which I never
> >> > understood, it also sorts the data this or that way when -expand-
> ing
> >> > the bootstrap frequencies.
> >>
> >> Yes, Stas, I also see the sorts, and yes, I also don't understand
> what
> >> it is doing
> >> exactly there. My view on bootstrap is that it is doing sampling
> with
> >> replacement,
> >>
> http://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29#Case_resamp
> >> ling ,
> >> so should be similar to the following minimal code:
> >>
> >> http://www.radyakin.org/statalist/2013100401/picksample.do
> >>
> >> which takes exactly N random numbers to create a subsample (with
> >> replacement)
> >> from the original sample of N observations. If Stata requires more
> >> 'randomness', I
> >> assume it is doing something more complicated, and I am curious to
> >> know what is it.
> >>
> >> Thank you, Sergiy Radyakin
> >>
> >>
> >>
> >> >
> >> > -- Stas Kolenikov, PhD, PStat (ASA, SSC)
> >> > -- Senior Survey Statistician, Abt SRBI
> >> > -- Opinions stated in this email are mine only, and do not reflect
> >> the
> >> > position of my employer
> >> > -- http://stas.kolenikov.name
> >> >
> >> >
> >> >
> >> > On Fri, Oct 4, 2013 at 1:45 PM, Sergiy Radyakin
> >> <[email protected]> wrote:
> >> >> Dear Statalist,
> >> >>
> >> >> suppose I want to bootsrap myself. For a dataset with 74
> >> observations
> >> >> to do two bootstrap iterations I would need to pick 2x74=148
> random
> >> >> numbers, but Stata picks 296. Why?
> >> >>
> >> >> Thank you, Sergiy Radyakin
> >> >> *
> >> >> *   For searches and help try:
> >> >> *   http://www.stata.com/help.cgi?search
> >> >> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> >> >> *   http://www.ats.ucla.edu/stat/stata/
> >> > *
> >> > *   For searches and help try:
> >> > *   http://www.stata.com/help.cgi?search
> >> > *   http://www.stata.com/support/faqs/resources/statalist-faq/
> >> > *   http://www.ats.ucla.edu/stat/stata/
> >> *
> >> *   For searches and help try:
> >> *   http://www.stata.com/help.cgi?search
> >> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> >> *   http://www.ats.ucla.edu/stat/stata/
> >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/faqs/resources/statalist-faq/
> > *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index