Michael Blasnik

<statalist@hsphsun2.harvard.edu> |

Re: st: Does Blasnik's Law apply to -use-?

Thu, 13 Sep 2007 12:51:41 -0400

These results are different than mine and do not directly address the question. You compare opening the entire file vs. opening a part of the file using -in-. But the goal is to select only a subset of observations. For that, you would need a second command after opening the entire file or you would need to use the -use if _n>xxx & _n<yyy- construct. I find that using the -if- approach takes more time than using -in- or simply opening the file. By the way, you can more accurately test the timing of individual commands using -set rmsg on- rather than simply displaying the time

M Blasnik

From: "David Elliott" <dcelliott@gmail.com>

To: <statalist@hsphsun2.harvard.edu>

Sent: Thursday, September 13, 2007 12:28 PM

Subject: Re: st: Does Blasnik's Law apply to -use-?

I was alerted offlist by a member that the mailer had truncated my previous reply in this thread - here it is again: Having used -parmby- recently and having some understanding of what Roger is discussing, I'd like to offer the following. From my interpretation of how Stata stores data, the ability to -use in ##/##- would require the record indexes to be created by completely loading the data. I am currently working on a 4 million record dataset and was able to run a quick test with a little program: n di "Begin: " _n c(current_date) " " c(current_time) _n use dss_data_05_06 in 1/1000, clear n di "Load using in 1/1000" _n c(current_date) " " c(current_time) _n use dss_data_05_06, clear n di "Ordinary load" _n c(current_date) " " c(current_time) Output: Begin: 12 Sep 2007 15:02:46 Load using in 1/1000 12 Sep 2007 15:02:56 Ordinary load 12 Sep 2007 15:03:06 I switched the loading order and regardless, the load took 10 seconds either way. I don't think you can use this optimization. DC Elliott

