Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: bootsrap random number use

From	Sergiy Radyakin <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: bootsrap random number use
Date	Mon, 7 Oct 2013 12:52:40 -0400

On Mon, Oct 7, 2013 at 12:16 PM, philippe van kerm
<[email protected]> wrote:
>
>> -----Original Message-----
>> From: [email protected] [mailto:owner-
>> [email protected]] On Behalf Of Sergiy Radyakin
>> Sent: Monday, October 07, 2013 4:02 PM
>> To: [email protected]
>> Subject: Re: st: bootsrap random number use
>>
>> On Mon, Oct 7, 2013 at 6:48 AM, philippe van kerm
>> <[email protected]> wrote:
>> > It seems to me the -bsample- code is simply meant to avoid the
>> explicit loop over observations (and so is fast even with many
>> observations), but does not do extra magic otherwise. I would think the
>> second uniform() ensures that the bootstrap draw does not depend on the
>> initial sort order of the data.
>>
>> Dear Philippe, thank you for this addition, but I still don't get it:
>> how would the draws depend on the sort order of the data?
>
> Sergiy,
>
> I think it is because of the particular way -bsample- is coded. And in fact, my statement was inaccurate: it is a reproducibility issue.
>
> Internally -bsample- does
>         gen double `r' = int(uniform()*_N + 1)
>       gen double `w' = uniform()
>       sort `r' `w'
> The second variable (`w') ensures that the sort order is identical across repetitions for a given value of the seed (this would not be guaranteed otherwise).


Dear Philippe, I guess I understand now (with your explanation and the
following post from Bill Gould
http://blog.stata.com/2012/08/03/using-statas-random-number-generators-part-2-drawing-without-replacement/)
that the second random variable would ensure the results are
reproducible. However, collisions in `w' are still possible, though
waaay less likely because of its 'double' precision. It seems to me
the same result could have been achieved if the above quoted code was:
        gen double `r' = uniform()*_N + 1
       sort `r'
       replace `r'=int(`r')
but this code would use only one random number per observation.

Thank you, Sergiy Radyakin




>
> I convinced myself by repeating this code a few times:
>
>  clear
>  set seed 12345
>  set obs 1000
>  gen id = _n
>  generate ui = floor((_N)*runiform() + 1)
>  generate w = runiform()
>  sort ui
>  list id ui  in 1/10
>  sort ui w
>  list id ui  in 1/10
>
> Despite the -set seed- statement, the sort order -sort ui- is not identical across replications, while it is always the same after -sort ui w-.
>
> This is crucial to ensure reproducibility of -bsample- results.
>
>> Is there
>> such a problem with my model code? Note that I don't loop over
>> observations, I loop over -draws-. Performance is not an issue here,
>> but the amount of randomness is. Even if I can't recover the logic
>> behind the bootstrap, can I be absolutely confident that it will
>> require 2*N*k random numbers for k iterations? Or is it (N+1)*k?
>
> I think the different number of random numbers required by -bsample- and your code just reflects coding differences. It is not inherent to the bootstrap. I would guess you could code it with N random numbers, if that matters.
>
> Philippe
>
>> Thank you, Sergiy Radyakin
>>
>>
>> >
>> > Philippe
>> >
>> >> -----Original Message-----
>> >> From: [email protected] [mailto:owner-
>> >> [email protected]] On Behalf Of Sergiy Radyakin
>> >> Sent: Saturday, October 05, 2013 1:15 AM
>> >> To: [email protected]
>> >> Subject: Re: st: bootsrap random number use
>> >>
>> >> On Fri, Oct 4, 2013 at 6:47 PM, Stas Kolenikov <[email protected]>
>> >> wrote:
>> >> > As far as I remember looking at the -bsample- code, which I never
>> >> > understood, it also sorts the data this or that way when -expand-
>> ing
>> >> > the bootstrap frequencies.
>> >>
>> >> Yes, Stas, I also see the sorts, and yes, I also don't understand
>> what
>> >> it is doing
>> >> exactly there. My view on bootstrap is that it is doing sampling
>> with
>> >> replacement,
>> >>
>> http://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29#Case_resamp
>> >> ling ,
>> >> so should be similar to the following minimal code:
>> >>
>> >> http://www.radyakin.org/statalist/2013100401/picksample.do
>> >>
>> >> which takes exactly N random numbers to create a subsample (with
>> >> replacement)
>> >> from the original sample of N observations. If Stata requires more
>> >> 'randomness', I
>> >> assume it is doing something more complicated, and I am curious to
>> >> know what is it.
>> >>
>> >> Thank you, Sergiy Radyakin
>> >>
>> >>
>> >>
>> >> >
>> >> > -- Stas Kolenikov, PhD, PStat (ASA, SSC)
>> >> > -- Senior Survey Statistician, Abt SRBI
>> >> > -- Opinions stated in this email are mine only, and do not reflect
>> >> the
>> >> > position of my employer
>> >> > -- http://stas.kolenikov.name
>> >> >
>> >> >
>> >> >
>> >> > On Fri, Oct 4, 2013 at 1:45 PM, Sergiy Radyakin
>> >> <[email protected]> wrote:
>> >> >> Dear Statalist,
>> >> >>
>> >> >> suppose I want to bootsrap myself. For a dataset with 74
>> >> observations
>> >> >> to do two bootstrap iterations I would need to pick 2x74=148
>> random
>> >> >> numbers, but Stata picks 296. Why?
>> >> >>
>> >> >> Thank you, Sergiy Radyakin
>> >> >> *
>> >> >> *   For searches and help try:
>> >> >> *   http://www.stata.com/help.cgi?search
>> >> >> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> >> >> *   http://www.ats.ucla.edu/stat/stata/
>> >> > *
>> >> > *   For searches and help try:
>> >> > *   http://www.stata.com/help.cgi?search
>> >> > *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> >> > *   http://www.ats.ucla.edu/stat/stata/
>> >> *
>> >> *   For searches and help try:
>> >> *   http://www.stata.com/help.cgi?search
>> >> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> >> *   http://www.ats.ucla.edu/stat/stata/
>> >
>> > *
>> > *   For searches and help try:
>> > *   http://www.stata.com/help.cgi?search
>> > *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> > *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: bootsrap random number use
  - From: Sergiy Radyakin <[email protected]>
- Re: st: bootsrap random number use
  - From: Stas Kolenikov <[email protected]>
- Re: st: bootsrap random number use
  - From: Sergiy Radyakin <[email protected]>
- RE: st: bootsrap random number use
  - From: philippe van kerm <[email protected]>
- Re: st: bootsrap random number use
  - From: Sergiy Radyakin <[email protected]>
- RE: st: bootsrap random number use
  - From: philippe van kerm <[email protected]>

Prev by Date: st: finding the optimum number of lags
Next by Date: Re: st: Upcoming NetCourses
Previous by thread: RE: st: bootsrap random number use
Next by thread: st: esttab and additional stats (in matrix form)
Index(es):
- Date
- Thread