[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
wgould@stata.com (William Gould, Stata) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Problem with seed and bootstrap |

Date |
Mon, 19 Sep 2005 10:53:22 -0500 |

Svend Juul <SJ@SOCI.AU.DK> writes > Imagine that -sort ... , stable- was the default, but that you could > avoid it with an -unstable- option. Can anybody imagine a situation > where a user would benefit from the -unstable- option? My reply is that (1) -sort, stable- does consume more computer time and, (2) -sort, unstable- uncovers bugs. (2) requires some explanation. We write down formulas all the time that state the minimum assumptions necessary to carry forth a calculation. Such a formula might go, 1. Make the following calculation within group t_i = ... 2. Then sum t_i to obtain the test statistic. I know of a researcher (who shall remain nameless) who wrote down exactly a calculation like that. He did simulations, too, not using Stata, and it worked well. I cannot remember whether the paper actually made it to print, but if not, it was on the way. We were implementing this same test statistic in Stata at the researcher's request. He gave us some datasets and certified answers. We wrote our program and discovered that sometimes we got the claimed answer, and sometimes we did not. When we examined our code, we discovered that we got the "right" answer if we added -stable- to -sort- at the "Make the following calculation within group" step. Problem was, nobody had noticed that the t_i calculation was not determinant; not the original author, not reviewers, and not his test runs. Even so, the formula was a function of within-group sort order. This lead to a reconsideration, and an improvement. Let me add that I have other, less dramatic examples of the benefits of unstable. In those less dramatric cases, it was not the formula that was wrong, it was our code. In a programming or procedural language, one states what is to be done. A good language makes the assumptions obvious. When I code . sort group I mean the data are to be sorted by group, and not, say, that within group, by time. (If the data happened to be sorted by time before hand, than a stable sort would yield a dataset sorted by group and time, and that kind of behavior is what leads to undiscovered bugs.) I have not been paying adequate attention to this tread, but I gather that for someone, -sort- was creating the problem and option -stable- solved it. My first question, on hearing that, is "Why is that?". What hidden assumption is laying around? What is it that the user really needs to code? When I type -sort group-, I should be make assumptions about how the data are already sorted. Alejandro's problem may well be that the formula he is bootstrapping is not determinant and, if so, that should bother him. At this point, all we really know is that -stable- solved the reproducibility problem. There is one thing, however, that I can guarantee: there is a program bug or a substantive error, and -stable- is covering it up. -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Problem with seed and bootstrap***From:*Richard Williams <Richard.A.Williams.5@ND.edu>

- Prev by Date:
**st: save residuals from <istdize>?** - Next by Date:
**st: New command in SSC: -checkfor2-** - Previous by thread:
**RE: st: Problem with seed and bootstrap** - Next by thread:
**Re: st: Problem with seed and bootstrap** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |