[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
RE: st: RE: do-files as programs
philippe van kerm <firstname.lastname@example.org>
RE: st: RE: do-files as programs
Fri, 26 Sep 2008 14:28:27 +0200
This is interesting.
I have the understanding that 1./2. is the 'preferred' approach, and that -program-s should be kept for 'creating new commands', not for running simple batches of commands (that being the purpose of 'do files').
One argument I see against using -program-s instead of multiple/nested do-files is the risk of inadvertently redefining a command. Try for example,
pr def ml
di "Do Maria Lisa's tasks here... some data manipulation for example"
pr def analysis
ssc install dagumfit
This fails because -ml- is inadvertently redefining the official -ml- command which is used internally by -dagumfit-. Not only is -dagumfit- not working anymore, but -ml- is executed where it should not -- and this can be a serious issue if -ml- does something unwanted to the data, for example.
This means that one would need to be careful and constantly check for name conflicts whenever strategy 3./4. is adopted. So advising strategy 1./2. (with do-files) seems in general safer, in particular for novice users who may not notice the perils.
But I'm be curious to read arguments for/against this claim. (The timing argument reported here is one!)
> -----Original Message-----
> From: email@example.com [mailto:owner-
> firstname.lastname@example.org] On Behalf Of Gabi Huiber
> Sent: Thursday, September 25, 2008 10:48 PM
> To: email@example.com
> Subject: Re: st: RE: do-files as programs
> Martin, thank you for the pointers. For all they're worth, here are my
> There are four basic ways, it seems to me, to organize a project in do-
> 1. You just make a list of instructions, executed one after another.
> If you need any of them executed more than once, put them inside a
> foreach loop. That's one do-file, and it will get as long and involved
> as the project demands it. It's the way we program when we are new to
> 2. You break up the problem into smaller do-files and have a master
> file call them call them as needed.
> Neither of the above makes use of the program feature. Do-files are
> read in as many times as they are used. 2. has certain advantages over
> 1. in ease of debugging and general readability, as short do-files are
> easier to pore over than long ones, but the project will have to rely
> on multiple inter-linked do-files instead of one. My guess is that the
> project doesn't have to be terribly complex before the advantages
> trump this drawback.
> 3. You make a do-file organized broadly as follows: in the first
> section you declare any programs you need, then in the second you
> invoke them as needed.
> 4. The programs at 3. above are saved as separate do-files, and a
> master file calls them in once with the "do file.do" command, then
> executes them as many times as needed by invoking their name. So, both
> 3. and 4. do make use of this "program" feature.
> My test project: I had to tabulate one dummy variable in 30 different
> files, then save the matcells in a master matrix. To make it last a
> little, I ran the same thing twice. I organized the project in the
> four ways above:
> 1. One do-file with no programs in it;
> 2. Four separate do-files organized as a master calling the other
> three twice each;
> 3. One do-file with three programs in it, each invoked twice and
> 4. Four do-files, where the master was calling in the other three
> once, then invoking them each twice.
> The results are as follows: 3 was fastest at 6 seconds, followed by 4
> at 9 seconds or so. 2 and 1 were about equally bad, some 11-12 seconds
> This suggests that declaring do-files as programs increases
> productivity. I hope this helps somebody.
* For searches and help try: