Re: st: Problem with seed and bootstrap

 From wgould@stata.com (William Gould, Stata) To statalist@hsphsun2.harvard.edu Subject Re: st: Problem with seed and bootstrap Date Mon, 19 Sep 2005 10:53:22 -0500

```Svend Juul <SJ@SOCI.AU.DK> writes

> Imagine that -sort ... , stable- was the default, but that you could
> avoid it with an -unstable- option. Can anybody imagine a situation
> where a user would benefit from the -unstable- option?

My reply is that (1) -sort, stable- does consume more computer time
and,  (2) -sort, unstable- uncovers bugs.

(2) requires some explanation.  We write down formulas all the time
that state the minimum assumptions necessary to carry forth a
calculation.  Such a formula might go,

1.  Make the following calculation within group

t_i = ...

2.  Then sum t_i to obtain the test statistic.

I know of a researcher (who shall remain nameless) who wrote down exactly a
calculation like that.  He did simulations, too, not using Stata, and it
worked well.  I cannot remember whether the paper actually made it to print,
but if not, it was on the way.

We were implementing this same test statistic in Stata at the researcher's
request.  He gave us some datasets and certified answers.  We wrote our
program and discovered that sometimes we got the claimed answer, and sometimes
we did not.  When we examined our code, we discovered that we got the "right"
answer if we added -stable- to -sort- at the "Make the following calculation
within group" step.

Problem was, nobody had noticed that the t_i calculation was not determinant;
not the original author, not reviewers, and not his test runs.  Even so, the
formula was a function of within-group sort order.  This lead to a
reconsideration, and an improvement.

Let me add that I have other, less dramatic examples of the benefits of
unstable.  In those less dramatric cases, it was not the formula that
was wrong, it was our code.

In a programming or procedural language, one states what is to be done.
A good language makes the assumptions obvious.  When I code

. sort group

I mean the data are to be sorted by group, and not, say, that within
group, by time.  (If the data happened to be sorted by time before hand,
than a stable sort would yield a dataset sorted by group and time, and
that kind of behavior is what leads to undiscovered bugs.)

I have not been paying adequate attention to this tread, but I
gather that for someone, -sort- was creating the problem and option
-stable- solved it.

My first question, on hearing that, is "Why is that?".  What hidden
assumption is laying around?  What is it that the user really needs
to code?  When I type -sort group-, I should be make assumptions about
how the data are already sorted.

Alejandro's problem may well be that the formula he is bootstrapping is not
determinant and, if so, that should bother him.  At this point, all we really
know is that -stable- solved the reproducibility problem.  There is one thing,
however, that I can guarantee:  there is a program bug or a substantive error,
and -stable- is covering it up.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```