[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Michael Blasnik" <[email protected]> |

To |
<[email protected]> |

Subject |
st: Re: Does Blasnik's Law apply to -use-? |

Date |
Wed, 12 Sep 2007 13:45:21 -0400 |

...

Based on a few tests, it does appears to apply. The -in- approach reduced execution time by about 50% when selecting 100K observations from the middle of a file with 7 million obs.

In many cases, the difference in execution speed for each command is fairly trivial -- in my tests the difference was only about 0.8 seconds. The real speed benefits occur when the command is executed many times in a loop using a large dataset -- such as identifying members of a each panel in a dataset with 1000's of panels. If -parmby- is similar to -statsby- then the speed benefits will be substantial for users working with large datasets with many levels of the -by- variable, but not very large for those with few levels or smaller datasets.

Michael Blasnik

of Blasnik's law ;)

----- Original Message ----- From: "Newson, Roger B" <[email protected]>

To: <[email protected]>

Sent: Wednesday, September 12, 2007 10:03 AM

Subject: st: Does Blasnik's Law apply to -use-?

I have a query re Blasnik's Law, first named in the Statalist archives by Nick Cox at http://www.stata.com/statalist/archive/2007-08/msg00668.html which states that using the -in- qualifier uses less computing time than the equivalent -if- qualifier. For instance regress mpg weight in 53/74 uses less time than regress mpg weight if _n>=53 & _n<=74 because Stata does not have to check every observation in the dataset in memory the first way, but has to do so the second way. My query is: Does Blasnik's Law apply to the -use- command? That is to say, does the statement use mybigdata.dta in 3959/4030 use much less computing time than the statement use mybigdata.dta if _n>=3959 & _n<=4030 which should input the same data into the memory? I ask because, as I understand it, Stata datasets are sequential-access files (unlike SAS datasets which I understand are random-access, with the option of having multiple indices), and this should imply that Stata has to read through observations 1 to 3958 before reading observation 3959. My motivation is that I wish to streamline the command -parmby-, which currently processes multiple by-groups by inputting the whole dataset repeatedly, using the -restore, preserve- command, and then dropping all by-groups except one. I am trying to think of a better way. Best wishes (and thanks in advance) Roger

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Does Blasnik's Law apply to -use-?***From:*"Newson, Roger B" <[email protected]>

- Prev by Date:
**st: FW: Assigning labels of 1 file into another file?** - Next by Date:
**Re: st: Does Blasnik's Law apply to -use-?** - Previous by thread:
**st: Does Blasnik's Law apply to -use-?** - Next by thread:
**Re: st: Does Blasnik's Law apply to -use-?** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |