Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Does Blasnik's Law apply to -use-?


From   "Sergiy Radyakin" <serjradyakin@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Does Blasnik's Law apply to -use-?
Date   Tue, 18 Sep 2007 18:54:48 +0200

Hello Roger,

could you please summarize what was the warning about? And, in
particular, whether it relates to "_prefix"-commands or to "_"-prefix
commands (where  "_prefix_expand" would be an example of the former
and "_regress" an example of the latter).

Though the help for the "_prefix"-commands seems to be interesting, I
find it more exciting to learn about the commands which are not only
not documented, they are not even mentioned anywhere, not even in the
internet (google currently returns 0 links). Does anyone has an idea
of how the "_xt..." commands work? I mean these:

_xtarm
_xtmka
_xtmkz
_xtzw
_xtwhw
_xta2

Does anyone has a complete list of _all Stata commands and is willing
to present it to the community?

Thank you,
   Sergiy





On 9/16/07, Newson, Roger B <r.newson@imperial.ac.uk> wrote:
> Thanks to David Elliot, Mike Blasnik and David Airey for their very
> helpful and detailed replies to my query. These shall be used to inform
> the first Stata 10 update to -parmby-, when I have Stata 10.
>
> And thanks also to Vince Wiggins, who warned me (during the 13th UK
> Stata User Meeting last week) of the dangers of ordinary users trying to
> get too deep into the undocumented _prefix suite of commands, used
> internally by StataCorp for -statsby- and other prefixes. (In Stata,
> type
>
> whelp _prefix
>
> to find out more about these.)
>
> Best wishes
>
> Roger
>
>
> Roger Newson
> Lecturer in Medical Statistics
> Respiratory Epidemiology and Public Health Group
> National Heart and Lung Institute
> Imperial College London
> Royal Brompton campus
> Room 33, Emmanuel Kaye Building
> 1B Manresa Road
> London SW3 6LR
> UNITED KINGDOM
> Tel: +44 (0)20 7352 8121 ext 3381
> Fax: +44 (0)20 7351 8322
> Email: r.newson@imperial.ac.uk
> Web page: www.imperial.ac.uk/nhli/r.newson/
> Departmental Web page:
> http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/pop
> genetics/reph/
>
> Opinions expressed are those of the author, not of the institution.
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of David Elliott
> Sent: 14 September 2007 15:07
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: Does Blasnik's Law apply to -use-?
>
> Being Stata users, we should approach this in a rigorous scientific
> fashion:
>
> X-----begin-----X
>
> program define intest
> version 9.0
>
> *! version 1.0.0  2007.09.13
> *! Simulate using part of file with in #/##
> *! by David C. Elliott
> *!
> *! using name of trial dataset
> *! postname specifies filename of postfile
> *! numblocks is number of file blocks to create
>
>
> syntax using/ ,POSTname(string) NUMblocks(int)
>
> local more `c(more)'
> set more off
>
> use `using', clear //Load first to eliminate any first pass caching
> effects
> local recblock = round(`c(N)'/`numblocks',1)
>
> tempname post
> postfile  `post' double block float timein timeif using `postname',
> every(10) replace
>
> timer clear 1
> n di _n(2) "{txt}{col 11}{center 10:-- IF --}{center 10:-- IN --}" _n
> ///
>  "{center 10:Block}{center 10:Time}{center 10:Time}" _n ///
>  "{hline 30}"
> local lastblock = `c(N)' - `recblock'
> forvalues i=1(`recblock')`lastblock ' {
>        local block = `i'
>        foreach I in if in {
>                if "`I'" == "in" {
>                        local ifin in `i'/`=`i'+`recblock''
>                        }
>                        else {
>                                local ifin if inrange(_n, `i',
> `=`i'+`recblock'')
>                                }
>                timer on 1
>                use `using' `ifin', clear
>                timer off 1
>                qui timer list 1
>                local time`I' :display %5.2f round(`r(t1)',.01)
>                timer clear 1
>                }
>        post `post' (`block') (`timein') (`timeif')
>        n di "{res}{ralign 10:`block'}{ralign 10:`timeif'}{ralign
> 10:`timein'}"
>        }
> postclose `post'
> set more `more'
> use `postname', clear
> lab var block "Record Block"
> lab var timein "Load Time using IN"
> lab var timeif "Load Time using IF"
> tw line timein block || line timeif block
> end
>
> X-----end-----X
>
> eg:
>
> . intest using dss_data_06_07.dta , postname(intest.dta) numblocks(100)
>
>
>           -- IN --  -- IF --
>  Block      Time      Time
> ------------------------------
>         1      0.64      0.88
>     17278      0.47      0.77
>     34555      0.47      0.77
>     51832      0.47      0.78
>     69109      0.45      0.78
>     86386      0.45      0.78
>    103663      0.47      0.78
>    120940      0.47      0.77
>  ...
>
> This adofile will run an -if- versus -in- simulation and graph the
> results.  From my findings I can confirm a speed advantage of about
> 50% using -in- on dataset with obs:1,727,673 vars:28 size:266,061,642
>
> However, things get murkier.  Run a simulation, then max out Stata's
> memory setting with as much memory as the system will give you and run
> the simulation again.  When you do this, you eliminate the system's
> ability to cache the file.  Ordinarily, subject to filesize and
> available memory, Stata may be reading the file from cache.  If this
> is the case, one will see an advantage to using -in-.  However, if the
> caching advantage is eliminated by increasing Stata memory, my
> simulations show the speed reduction using -in- is negated.  I also
> tested this on large network databases and was unable to demonstrate
> any advantage to -in-.
>
> So back to Roger's initial question.  It would appear that for
> cacheable filesizes and large numbers of bygroups a strategy using
> -in- might be feasible.  There is an overhead penalty of setting up
> the bygroups to make them selectable using -in- involving sorts and
> the like.  For a small number of bygroups the speed advantages might
> be lost, but for many levels and a large number of iterations there
> would be an advantage.
>
> DC Elliott
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index