Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: processing time


From   David Kantor <kantor.d@att.net>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: processing time
Date   Thu, 22 Mar 2007 14:02:55 -0400

At 12:55 PM 3/22/2007, Jon Schwabish wrote:

Which is more efficient (in terms of processing time)?

drop if a==.
drop if b==.

  OR

drop if a==. | b==.
I would think that the latter is more efficient, especially with large datasets. You incur the cost of parsing and executing a command once, rather than twice (though the expression is more complex, but I don't suppose that matters much). Furthermore, the latter may be especially more efficient if there are many cases with b==. that do not have a==. . The reason is that when you drop observations, there is, I suppose, a moving of records to close up the holes. With the two-command method, some records will be moved twice, rather than once.

I suppose it makes little difference for small datasets.

You can also -set rmsg on-, and run some experiments.

Finally, be aware that a==. is not the general way to test for missing value; that will test for equality with one specific missing value. The way to test for missing values in general is mi(a) or a>=. . The method of mi(a) is even more general in that it works for string types as well.

HTH
--David

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index