Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: do-files as programs


From   "Martin Weiss" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: do-files as programs
Date   Fri, 26 Sep 2008 14:27:27 +0200

Actually, this is the first time I have come across a recommendation for
-program- instead of nested do-filing. I used to think that -program-s are
the more versatile tools that you would turn to to conduct analyses for
several projects, not just one. 
The naming issue simply requires you to -which- the command beforehand.
IMHO, that is not a major obstacle to the strategy described by Gabi.


HTH
Martin


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of philippe van kerm
Sent: Friday, September 26, 2008 2:28 PM
To: [email protected]
Subject: RE: st: RE: do-files as programs

This is interesting.

I have the understanding that 1./2. is the 'preferred' approach, and that
-program-s should be kept for 'creating new commands', not for running
simple batches of commands (that being the purpose of 'do files').

One argument I see against using -program-s instead of multiple/nested
do-files is the risk of inadvertently redefining a command. Try for example,

  pr def ml
    di "Do Maria Lisa's tasks here... some data manipulation for example"
    end

  pr def analysis
    ssc install dagumfit
    sysuse auto
    dagumfit price
    end

  ml
  analysis

This fails because -ml- is inadvertently redefining the official -ml-
command which is used internally by -dagumfit-. Not only is -dagumfit- not
working anymore, but -ml- is executed where it should not -- and this can be
a serious issue if -ml- does something unwanted to the data, for example.

This means that one would need to be careful and constantly check for name
conflicts whenever strategy 3./4. is adopted. So advising strategy 1./2.
(with do-files) seems  in general safer, in particular for novice users who
may not notice the perils.


But I'm be curious to read arguments for/against this claim. (The timing
argument reported here is one!)


Philippe


> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of Gabi Huiber
> Sent: Thursday, September 25, 2008 10:48 PM
> To: [email protected]
> Subject: Re: st: RE: do-files as programs
>
> Martin, thank you for the pointers. For all they're worth, here are my
> findings:
>
> There are four basic ways, it seems to me, to organize a project in do-
> files:
>
> 1. You just make a list of instructions, executed one after another.
> If you need any of them executed more than once, put them inside a
> foreach loop. That's one do-file, and it will get as long and involved
> as the project demands it. It's the way we program when we are new to
> Stata.
>
> 2. You break up the problem into smaller do-files and have a master
> file call them call them as needed.
>
> Neither of the above makes use of the program feature. Do-files are
> read in as many times as they are used. 2. has certain advantages over
> 1. in ease of debugging and general readability, as short do-files are
> easier to pore over than long ones, but the project will have to rely
> on multiple inter-linked do-files instead of one. My guess is that the
> project doesn't have to be terribly complex before the advantages
> trump this drawback.
>
> 3. You make a do-file organized broadly as follows: in the first
> section you declare any programs you need, then in the second you
> invoke them as needed.
>
> 4. The programs at 3. above are saved as separate do-files, and a
> master file calls them in once with the "do file.do" command, then
> executes them as many times as needed by invoking their name. So, both
> 3. and 4. do make use of this "program" feature.
>
> My test project: I had to tabulate one dummy variable in 30 different
> files, then save the matcells in a master matrix. To make it last a
> little, I ran the same thing twice. I organized the project in the
> four ways above:
>
> 1. One do-file with no programs in it;
> 2. Four separate do-files organized as a master calling the other
> three twice each;
> 3. One do-file with three programs in it, each invoked twice and
> finally
> 4. Four do-files, where the master was calling in the other three
> once, then invoking them each twice.
>
> The results are as follows: 3 was fastest at 6 seconds, followed by 4
> at 9 seconds or so. 2 and 1 were about equally bad, some 11-12 seconds
> each.
>
> This suggests that declaring do-files as programs increases
> productivity. I hope this helps somebody.
>
> Gabi

<SNIP>


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index