Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: Re: st: processing time

From   n j cox <>
Subject   Re: Re: st: processing time
Date   Thu, 22 Mar 2007 18:37:45 +0000

It would be interesting to know, in broad terms, what
Stata does. Setting aside
efficiency matters for the moment, consider commands like

drop if mod(_n, 2)


drop if y == . | y[_n-1] == .

in which the decision on -drop-ping is sensitive to _n.
From examples like these, it seems that the
dropping cannot start before the identification of
observations to be dropped has finished.

I've not seen the code, so this is just a guess.
But David is surely right that two commands must entail
at a minimum two loops over the observations (and
perhaps even four).


David Kantor

I would think that the latter is more efficient, especially with
large datasets. You incur the cost of parsing and executing a command
once, rather than twice (though the expression is more complex, but I
don't suppose that matters much). Furthermore, the latter may be
especially more efficient if there are many cases with b==. that do
not have a==. .  The reason is that when you drop observations, there
is, I suppose, a moving of records to close up the holes. With the
two-command method, some records will be moved twice, rather than once.

I suppose it makes little difference for small datasets.

You can also -set rmsg on-, and run some experiments.

Finally, be aware that a==. is not the general way to test for
missing value; that will test for equality with one specific missing
value.  The way to test for missing values in general is mi(a) or
a>=. . The method of mi(a) is even more general in that it works for
string types as well.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index