Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Sergiy Radyakin <serjradyakin@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: bootsrap random number use |

Date |
Mon, 7 Oct 2013 12:52:40 -0400 |

On Mon, Oct 7, 2013 at 12:16 PM, philippe van kerm <philippe.vankerm@ceps.lu> wrote: > >> -----Original Message----- >> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner- >> statalist@hsphsun2.harvard.edu] On Behalf Of Sergiy Radyakin >> Sent: Monday, October 07, 2013 4:02 PM >> To: statalist@hsphsun2.harvard.edu >> Subject: Re: st: bootsrap random number use >> >> On Mon, Oct 7, 2013 at 6:48 AM, philippe van kerm >> <philippe.vankerm@ceps.lu> wrote: >> > It seems to me the -bsample- code is simply meant to avoid the >> explicit loop over observations (and so is fast even with many >> observations), but does not do extra magic otherwise. I would think the >> second uniform() ensures that the bootstrap draw does not depend on the >> initial sort order of the data. >> >> Dear Philippe, thank you for this addition, but I still don't get it: >> how would the draws depend on the sort order of the data? > > Sergiy, > > I think it is because of the particular way -bsample- is coded. And in fact, my statement was inaccurate: it is a reproducibility issue. > > Internally -bsample- does > gen double `r' = int(uniform()*_N + 1) > gen double `w' = uniform() > sort `r' `w' > The second variable (`w') ensures that the sort order is identical across repetitions for a given value of the seed (this would not be guaranteed otherwise). Dear Philippe, I guess I understand now (with your explanation and the following post from Bill Gould http://blog.stata.com/2012/08/03/using-statas-random-number-generators-part-2-drawing-without-replacement/) that the second random variable would ensure the results are reproducible. However, collisions in `w' are still possible, though waaay less likely because of its 'double' precision. It seems to me the same result could have been achieved if the above quoted code was: gen double `r' = uniform()*_N + 1 sort `r' replace `r'=int(`r') but this code would use only one random number per observation. Thank you, Sergiy Radyakin > > I convinced myself by repeating this code a few times: > > clear > set seed 12345 > set obs 1000 > gen id = _n > generate ui = floor((_N)*runiform() + 1) > generate w = runiform() > sort ui > list id ui in 1/10 > sort ui w > list id ui in 1/10 > > Despite the -set seed- statement, the sort order -sort ui- is not identical across replications, while it is always the same after -sort ui w-. > > This is crucial to ensure reproducibility of -bsample- results. > >> Is there >> such a problem with my model code? Note that I don't loop over >> observations, I loop over -draws-. Performance is not an issue here, >> but the amount of randomness is. Even if I can't recover the logic >> behind the bootstrap, can I be absolutely confident that it will >> require 2*N*k random numbers for k iterations? Or is it (N+1)*k? > > I think the different number of random numbers required by -bsample- and your code just reflects coding differences. It is not inherent to the bootstrap. I would guess you could code it with N random numbers, if that matters. > > Philippe > >> Thank you, Sergiy Radyakin >> >> >> > >> > Philippe >> > >> >> -----Original Message----- >> >> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner- >> >> statalist@hsphsun2.harvard.edu] On Behalf Of Sergiy Radyakin >> >> Sent: Saturday, October 05, 2013 1:15 AM >> >> To: statalist@hsphsun2.harvard.edu >> >> Subject: Re: st: bootsrap random number use >> >> >> >> On Fri, Oct 4, 2013 at 6:47 PM, Stas Kolenikov <skolenik@gmail.com> >> >> wrote: >> >> > As far as I remember looking at the -bsample- code, which I never >> >> > understood, it also sorts the data this or that way when -expand- >> ing >> >> > the bootstrap frequencies. >> >> >> >> Yes, Stas, I also see the sorts, and yes, I also don't understand >> what >> >> it is doing >> >> exactly there. My view on bootstrap is that it is doing sampling >> with >> >> replacement, >> >> >> http://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29#Case_resamp >> >> ling , >> >> so should be similar to the following minimal code: >> >> >> >> http://www.radyakin.org/statalist/2013100401/picksample.do >> >> >> >> which takes exactly N random numbers to create a subsample (with >> >> replacement) >> >> from the original sample of N observations. If Stata requires more >> >> 'randomness', I >> >> assume it is doing something more complicated, and I am curious to >> >> know what is it. >> >> >> >> Thank you, Sergiy Radyakin >> >> >> >> >> >> >> >> > >> >> > -- Stas Kolenikov, PhD, PStat (ASA, SSC) >> >> > -- Senior Survey Statistician, Abt SRBI >> >> > -- Opinions stated in this email are mine only, and do not reflect >> >> the >> >> > position of my employer >> >> > -- http://stas.kolenikov.name >> >> > >> >> > >> >> > >> >> > On Fri, Oct 4, 2013 at 1:45 PM, Sergiy Radyakin >> >> <serjradyakin@gmail.com> wrote: >> >> >> Dear Statalist, >> >> >> >> >> >> suppose I want to bootsrap myself. For a dataset with 74 >> >> observations >> >> >> to do two bootstrap iterations I would need to pick 2x74=148 >> random >> >> >> numbers, but Stata picks 296. Why? >> >> >> >> >> >> Thank you, Sergiy Radyakin >> >> >> * >> >> >> * For searches and help try: >> >> >> * http://www.stata.com/help.cgi?search >> >> >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> >> >> * http://www.ats.ucla.edu/stat/stata/ >> >> > * >> >> > * For searches and help try: >> >> > * http://www.stata.com/help.cgi?search >> >> > * http://www.stata.com/support/faqs/resources/statalist-faq/ >> >> > * http://www.ats.ucla.edu/stat/stata/ >> >> * >> >> * For searches and help try: >> >> * http://www.stata.com/help.cgi?search >> >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> >> * http://www.ats.ucla.edu/stat/stata/ >> > >> > * >> > * For searches and help try: >> > * http://www.stata.com/help.cgi?search >> > * http://www.stata.com/support/faqs/resources/statalist-faq/ >> > * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: bootsrap random number use***From:*Sergiy Radyakin <serjradyakin@gmail.com>

**Re: st: bootsrap random number use***From:*Stas Kolenikov <skolenik@gmail.com>

**Re: st: bootsrap random number use***From:*Sergiy Radyakin <serjradyakin@gmail.com>

**RE: st: bootsrap random number use***From:*philippe van kerm <philippe.vankerm@ceps.lu>

**Re: st: bootsrap random number use***From:*Sergiy Radyakin <serjradyakin@gmail.com>

**RE: st: bootsrap random number use***From:*philippe van kerm <philippe.vankerm@ceps.lu>

- Prev by Date:
**st: finding the optimum number of lags** - Next by Date:
**Re: st: Upcoming NetCourses** - Previous by thread:
**RE: st: bootsrap random number use** - Next by thread:
**st: esttab and additional stats (in matrix form)** - Index(es):