[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: processing time

From	David Kantor <[email protected]>
To	[email protected]
Subject	Re: st: processing time
Date	Thu, 22 Mar 2007 14:02:55 -0400

At 12:55 PM 3/22/2007, Jon Schwabish wrote:

Which is more efficient (in terms of processing time)?

drop if a==.
drop if b==.

  OR

drop if a==. | b==.

I would think that the latter is more efficient, especially with large datasets. You incur the cost of parsing and executing a command once, rather than twice (though the expression is more complex, but I don't suppose that matters much). Furthermore, the latter may be especially more efficient if there are many cases with b==. that do not have a==. . The reason is that when you drop observations, there is, I suppose, a moving of records to close up the holes. With the two-command method, some records will be moved twice, rather than once.

I suppose it makes little difference for small datasets.

You can also -set rmsg on-, and run some experiments.

Finally, be aware that a==. is not the general way to test for missing value; that will test for equality with one specific missing value. The way to test for missing values in general is mi(a) or a>=. . The method of mi(a) is even more general in that it works for string types as well.

HTH
--David

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

References:
- st: processing time
  - From: Jon Schwabish <[email protected]>

Prev by Date: Re: st: Re: st: Sample question -- can't replicate my results from my laptop to desktop eventhough I set seed
Next by Date: st: RE: processing time
Previous by thread: st: processing time
Next by thread: st: RE: processing time
Index(es):
- Date
- Thread