# st: RE: RE: re: statsby slowness

 From "Nick Cox" To Subject st: RE: RE: re: statsby slowness Date Mon, 20 Aug 2007 10:01:55 +0100

```The fact that -if- is always slower than an equivalent -in-
I call Blasnik's Law, not because Michael discovered it, but
because it needs a good name and he has done more than any
other user to make people aware of it.

Compare

keep in 1/100                        (1)

and

keep if _n <= 100                    (2)

and you imagine Stata implementing either of these. You
should be able to tell at a glance that they mean
the same thing, but you're a human and you are good
at working out meanings.

With (1), Stata can work out very fast to -keep-
the first 100 obs and -drop- everything else.

With (2), Stata is obliged by its own rules to test
every observation number _n against <= 100, and
to ask itself lots of questions like

_n is 2345. Is that <= 100? No.
So, don't -keep- this obs.

....

_n is 123456789. Is that <= 100? No.
So, don't -keep- this obs.

_n is 123456790. Is that <= 100? No.
So, don't -keep- this obs.

and so on,

because it has no intelligence to see the implications
that once you are past 100, further testing is
futile.

Hence the rule: Use -in- rather than -if- when they
are equivalent. Remember that with -if- Stata tests
_every_ observation to check whether the condition is
true, utterly regardless of whether it is "obvious"
that it need not do that. Stata doesn't do "obvious".

Nick
n.j.cox@durham.ac.uk

Nick Cox

> Interesting. You may get a bit more speed if
> you replace this
>
> egen rank_1 = rank(expression), by(ssrownum)
> egen rank_2 = rank(iso_VSV), by(ssrownum)
> egen corr = corr(rank_1 rank_2), by(ssrownum)
>
> by this:
>
> sort ssrownum
> by ssrowsum : egen rank_1 = rank(expression)
> by ssrowsum : egen rank_2 = rank(iso_VSV)
> by ssrowsum : egen corr = corr(rank_1 rank_2)
>
> The two code segments are equivalent in what
> you end with, but not in when they -sort-.
>
> SImilarly
>
> keep if _n >= `start' & _n <= `stop'
>
> should be faster as
>
> keep in `start'/`stop'
>
> and I would always use the built-in -sqrt()-
> when it applies, rather than powering to 0.5.
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```