Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: do-files as programs


From   "Gabi Huiber" <ghuiber@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: do-files as programs
Date   Thu, 25 Sep 2008 16:47:39 -0400

Martin, thank you for the pointers. For all they're worth, here are my findings:

There are four basic ways, it seems to me, to organize a project in do-files:

1. You just make a list of instructions, executed one after another.
If you need any of them executed more than once, put them inside a
foreach loop. That's one do-file, and it will get as long and involved
as the project demands it. It's the way we program when we are new to
Stata.

2. You break up the problem into smaller do-files and have a master
file call them call them as needed.

Neither of the above makes use of the program feature. Do-files are
read in as many times as they are used. 2. has certain advantages over
1. in ease of debugging and general readability, as short do-files are
easier to pore over than long ones, but the project will have to rely
on multiple inter-linked do-files instead of one. My guess is that the
project doesn't have to be terribly complex before the advantages
trump this drawback.

3. You make a do-file organized broadly as follows: in the first
section you declare any programs you need, then in the second you
invoke them as needed.

4. The programs at 3. above are saved as separate do-files, and a
master file calls them in once with the "do file.do" command, then
executes them as many times as needed by invoking their name. So, both
3. and 4. do make use of this "program" feature.

My test project: I had to tabulate one dummy variable in 30 different
files, then save the matcells in a master matrix. To make it last a
little, I ran the same thing twice. I organized the project in the
four ways above:

1. One do-file with no programs in it;
2. Four separate do-files organized as a master calling the other
three twice each;
3. One do-file with three programs in it, each invoked twice and finally
4. Four do-files, where the master was calling in the other three
once, then invoking them each twice.

The results are as follows: 3 was fastest at 6 seconds, followed by 4
at 9 seconds or so. 2 and 1 were about equally bad, some 11-12 seconds
each.

This suggests that declaring do-files as programs increases
productivity. I hope this helps somebody.

Gabi

On Thu, Sep 25, 2008 at 12:52 PM, Martin Weiss <martin.weiss1@gmx.de> wrote:
> -set rmsg on- or -h profiler- to be sure. Once the -program- has been read
> once, Stata holds it in memory (-pr dir- to see that) so I do not find this
> surprising at all...
>
>
> HTH
> Martin
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Gabi Huiber
> Sent: Thursday, September 25, 2008 6:48 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: st: do-files as programs
>
> I am comparing two ways to solve the same problem and I can't see why
> one of them runs more quickly, but it looks that way. Is there any
> reason why do-files declared as programs might run faster?
>
> Suppose you have a do-file call another a few times, inside a loop, like so:
>
> _____ example 1 starts here
>
> // this is my main file
>
> forvalues i=1/`numberoftimes' {
>
> [declare some parameters as globals]
>
> do mysubfile.do                    // uses the set of parameters declared
> above
> }
> _____ example 1 ends here
>
> In example 1, mysubfile.do is just a sequence of Stata commands. Now
> suppose that you edit mysubfile.do with these lines
> ___
>
> capture prog drop mySubfile
> prog def mySubfile
>
> [content of old mysubfile.do goes here]
>
> end
> ___
>
> This will then require that you run the code in example 1 as follows:
>
> _____ example 2 starts here
>
> // this is my main file
>
> do mysubfile.do
>
> forvalues i=1/`numberoftimes' {
>
> [declare some parameters as globals]
>
> mySubfile                    // uses the set of parameters declared above
> }
> _____ example 2 ends here
>
> It seems to me that example 2, while slightly more work to write up,
> runs noticeably faster. Am I imagining things?
>
> Thank you,
>
> Gabi
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index