[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"David Elliott" <dcelliott@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Does Blasnik's Law apply to -use-? |

Date |
Fri, 14 Sep 2007 11:07:05 -0300 |

Being Stata users, we should approach this in a rigorous scientific fashion: X-----begin-----X program define intest version 9.0 *! version 1.0.0 2007.09.13 *! Simulate using part of file with in #/## *! by David C. Elliott *! *! using name of trial dataset *! postname specifies filename of postfile *! numblocks is number of file blocks to create syntax using/ ,POSTname(string) NUMblocks(int) local more `c(more)' set more off use `using', clear //Load first to eliminate any first pass caching effects local recblock = round(`c(N)'/`numblocks',1) tempname post postfile `post' double block float timein timeif using `postname', every(10) replace timer clear 1 n di _n(2) "{txt}{col 11}{center 10:-- IF --}{center 10:-- IN --}" _n /// "{center 10:Block}{center 10:Time}{center 10:Time}" _n /// "{hline 30}" local lastblock = `c(N)' - `recblock' forvalues i=1(`recblock')`lastblock ' { local block = `i' foreach I in if in { if "`I'" == "in" { local ifin in `i'/`=`i'+`recblock'' } else { local ifin if inrange(_n, `i', `=`i'+`recblock'') } timer on 1 use `using' `ifin', clear timer off 1 qui timer list 1 local time`I' :display %5.2f round(`r(t1)',.01) timer clear 1 } post `post' (`block') (`timein') (`timeif') n di "{res}{ralign 10:`block'}{ralign 10:`timeif'}{ralign 10:`timein'}" } postclose `post' set more `more' use `postname', clear lab var block "Record Block" lab var timein "Load Time using IN" lab var timeif "Load Time using IF" tw line timein block || line timeif block end X-----end-----X eg: . intest using dss_data_06_07.dta , postname(intest.dta) numblocks(100) -- IN -- -- IF -- Block Time Time ------------------------------ 1 0.64 0.88 17278 0.47 0.77 34555 0.47 0.77 51832 0.47 0.78 69109 0.45 0.78 86386 0.45 0.78 103663 0.47 0.78 120940 0.47 0.77 ... This adofile will run an -if- versus -in- simulation and graph the results. From my findings I can confirm a speed advantage of about 50% using -in- on dataset with obs:1,727,673 vars:28 size:266,061,642 However, things get murkier. Run a simulation, then max out Stata's memory setting with as much memory as the system will give you and run the simulation again. When you do this, you eliminate the system's ability to cache the file. Ordinarily, subject to filesize and available memory, Stata may be reading the file from cache. If this is the case, one will see an advantage to using -in-. However, if the caching advantage is eliminated by increasing Stata memory, my simulations show the speed reduction using -in- is negated. I also tested this on large network databases and was unable to demonstrate any advantage to -in-. So back to Roger's initial question. It would appear that for cacheable filesizes and large numbers of bygroups a strategy using -in- might be feasible. There is an overhead penalty of setting up the bygroups to make them selectable using -in- involving sorts and the like. For a small number of bygroups the speed advantages might be lost, but for many levels and a large number of iterations there would be an advantage. DC Elliott * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: Does Blasnik's Law apply to -use-?***From:*"Newson, Roger B" <r.newson@imperial.ac.uk>

**References**:**st: Does Blasnik's Law apply to -use-?***From:*"Newson, Roger B" <r.newson@imperial.ac.uk>

**Re: st: Does Blasnik's Law apply to -use-?***From:*"David Elliott" <dcelliott@gmail.com>

**Re: st: Does Blasnik's Law apply to -use-?***From:*"David Elliott" <dcelliott@gmail.com>

**Re: st: Does Blasnik's Law apply to -use-?***From:*"Michael Blasnik" <michael.blasnik@verizon.net>

**RE: st: Does Blasnik's Law apply to -use-?***From:*"Newson, Roger B" <r.newson@imperial.ac.uk>

**Re: st: Does Blasnik's Law apply to -use-?***From:*"Michael Blasnik" <michael.blasnik@verizon.net>

- Prev by Date:
**st: global keys** - Next by Date:
**st: Reporting pseudo-r2** - Previous by thread:
**Re: st: Does Blasnik's Law apply to -use-?** - Next by thread:
**RE: st: Does Blasnik's Law apply to -use-?** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |