Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: -stylerules- updated on SSC


From   "Michael Blasnik" <michael.blasnik@verizon.net>
To   <statalist@hsphsun2.harvard.edu>
Subject   Re: st: -stylerules- updated on SSC
Date   Thu, 14 Apr 2005 08:18:35 -0400

In my experience, the trade-off in execution time between preserve / restore and a series of if `touse' statements can reach the tipping point fairly quickly. I find that about 10 (+-5) -if `touse'- statements requires about as much time as one set of preserve/ drop if `touse'/restore in fairly large datasets (in small datasets, it matters less). So a programmer may want to look at the preserve, drop if !`touse', restore approach if there are more than about 10 -if `touse' lines that need to be executed. These timings are for a machine where temp files are written to the local hard drive. In practice, I tend to avoid the preserve/restore approach unless the speed advatage is clear (perhaps 20+ if `touse ' lines).

Michael Blasnik
michael.blasnik@verizon.net


----- Original Message ----- From: "Roger Newson" <roger.newson@kcl.ac.uk>
To: <statalist@hsphsun2.harvard.edu>
Sent: Thursday, April 14, 2005 7:19 AM
Subject: Re: st: -stylerules- updated on SSC



Re Point 1, file input and output is usually MUCH more time-consuming than in-memory operations such as checking `touse'. However, re Point 2, Richard is right, and a block of code which modifies the data *permanently* should indeed start with a single -preserve- and end with a single -restore, not-, just in case the user presses -break-.

Roger


At 01:40 14/04/2005, Richard Williams wrote:

At 04:07 PM 4/13/2005 +0100, Nick Cox wrote:
The package -stylerules- on SSC has been
updated. I started by making a few changes
in the light of Stata 9, but mostly thought
of some other things to mention while
I was doing that.
Some very good rules there Nick. I'd be curious what thoughts people have about this one:

"Avoid preserve if possible. preserve is attractive to the programmer but can be expensive in time for the user with large data files. Programmers should learn to master marksample."

Suppose I ignore that advise and instead do something like

preserve
keep if `touse'

1. Will I gain some speed, because Stata will be working with a smaller data set and won't have to repeatedly process -if `touse'- commands?

<snip>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index