[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: do-files as programs

From   philippe van kerm <>
To   "" <>
Subject   RE: st: RE: do-files as programs
Date   Fri, 26 Sep 2008 14:28:27 +0200

This is interesting.

I have the understanding that 1./2. is the 'preferred' approach, and that -program-s should be kept for 'creating new commands', not for running simple batches of commands (that being the purpose of 'do files').

One argument I see against using -program-s instead of multiple/nested do-files is the risk of inadvertently redefining a command. Try for example,

  pr def ml
    di "Do Maria Lisa's tasks here... some data manipulation for example"

  pr def analysis
    ssc install dagumfit
    sysuse auto
    dagumfit price


This fails because -ml- is inadvertently redefining the official -ml- command which is used internally by -dagumfit-. Not only is -dagumfit- not working anymore, but -ml- is executed where it should not -- and this can be a serious issue if -ml- does something unwanted to the data, for example.

This means that one would need to be careful and constantly check for name conflicts whenever strategy 3./4. is adopted. So advising strategy 1./2. (with do-files) seems  in general safer, in particular for novice users who may not notice the perils.

But I'm be curious to read arguments for/against this claim. (The timing argument reported here is one!)


> -----Original Message-----
> From: [mailto:owner-
>] On Behalf Of Gabi Huiber
> Sent: Thursday, September 25, 2008 10:48 PM
> To:
> Subject: Re: st: RE: do-files as programs
> Martin, thank you for the pointers. For all they're worth, here are my
> findings:
> There are four basic ways, it seems to me, to organize a project in do-
> files:
> 1. You just make a list of instructions, executed one after another.
> If you need any of them executed more than once, put them inside a
> foreach loop. That's one do-file, and it will get as long and involved
> as the project demands it. It's the way we program when we are new to
> Stata.
> 2. You break up the problem into smaller do-files and have a master
> file call them call them as needed.
> Neither of the above makes use of the program feature. Do-files are
> read in as many times as they are used. 2. has certain advantages over
> 1. in ease of debugging and general readability, as short do-files are
> easier to pore over than long ones, but the project will have to rely
> on multiple inter-linked do-files instead of one. My guess is that the
> project doesn't have to be terribly complex before the advantages
> trump this drawback.
> 3. You make a do-file organized broadly as follows: in the first
> section you declare any programs you need, then in the second you
> invoke them as needed.
> 4. The programs at 3. above are saved as separate do-files, and a
> master file calls them in once with the "do" command, then
> executes them as many times as needed by invoking their name. So, both
> 3. and 4. do make use of this "program" feature.
> My test project: I had to tabulate one dummy variable in 30 different
> files, then save the matcells in a master matrix. To make it last a
> little, I ran the same thing twice. I organized the project in the
> four ways above:
> 1. One do-file with no programs in it;
> 2. Four separate do-files organized as a master calling the other
> three twice each;
> 3. One do-file with three programs in it, each invoked twice and
> finally
> 4. Four do-files, where the master was calling in the other three
> once, then invoking them each twice.
> The results are as follows: 3 was fastest at 6 seconds, followed by 4
> at 9 seconds or so. 2 and 1 were about equally bad, some 11-12 seconds
> each.
> This suggests that declaring do-files as programs increases
> productivity. I hope this helps somebody.
> Gabi


*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index