Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: -stylerules- updated on SSC


From   "Michael Blasnik" <michael.blasnik@verizon.net>
To   "Michael Blasnik" <michael.blasnik@verizon.net>, <statalist@hsphsun2.harvard.edu>
Subject   Re: st: -stylerules- updated on SSC
Date   Thu, 14 Apr 2005 08:40:37 -0400

I just wanted to follow up to say that the rough estimate I provided of 10 -if `touse'- = 1 -preserve-/-restore- can be far off depending on the proportion of values selected by the if statement and the relative width vs. length of the dataset. The time required for the preserve/restore will vary with the size of the dataset (and the speed of writing to the hard drive), the time required to execute each -if `touse'- will vary with the number of observations in the full dataset, and the time required to execute each of those commands without the -if `touse'- will depend on the number of observations remaining after -drop if !`touse'-.

So the preserve/restore approach is more favored for commands operating on long datasets, especially if the `touse' sample is typically small compared to the full dataset. In these situations, the break-even may be about 5 -if `touse'- statements.

The -if `touse'- approach is favored for wide datasets where the -if `touse'- sample is relatively large compared to the full dataset. In these situations, the break even may be closer to 15-20 -if `touse' statements.

Michael Blasnik
michael.blasnik@verizon.net


----- Original Message ----- From: "Michael Blasnik" <michael.blasnik@verizon.net>
To: <statalist@hsphsun2.harvard.edu>
Sent: Thursday, April 14, 2005 8:18 AM
Subject: Re: st: -stylerules- updated on SSC



In my experience, the trade-off in execution time between preserve / restore and a series of if `touse' statements can reach the tipping point fairly quickly. I find that about 10 (+-5) -if `touse'- statements requires about as much time as one set of preserve/ drop if `touse'/restore in fairly large datasets (in small datasets, it matters less). So a programmer may want to look at the preserve, drop if !`touse', restore approach if there are more than about 10 -if `touse' lines that need to be executed. These timings are for a machine where temp files are written to the local hard drive. In practice, I tend to avoid the preserve/restore approach unless the speed advatage is clear (perhaps 20+ if `touse ' lines).

Michael Blasnik
michael.blasnik@verizon.net


----- Original Message ----- From: "Roger Newson" <roger.newson@kcl.ac.uk>
To: <statalist@hsphsun2.harvard.edu>
Sent: Thursday, April 14, 2005 7:19 AM
Subject: Re: st: -stylerules- updated on SSC



Re Point 1, file input and output is usually MUCH more time-consuming than in-memory operations such as checking `touse'. However, re Point 2, Richard is right, and a block of code which modifies the data *permanently* should indeed start with a single -preserve- and end with a single -restore, not-, just in case the user presses -break-.

Roger
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index