Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: do-files as programs


From   "Austin Nichols" <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: do-files as programs
Date   Fri, 26 Sep 2008 08:39:10 -0400

Philippe--
I often define -program-s and even put them in .ado files just to run
series of commands, e.g. for data manipulations. I have even told
people I work with, and students, to put anything that might need to
be used more than 3 times in a -program- (the real optimal rule would
be much more complex). To be safe in naming such programs, one merely
has to avoid common words, and use -which- to make sure names are not
already in use.  But if I am calling the program from
dagumfit_analysis_2008.do, I might name the program file
dagumfit_analysis_2008_manipulations.ado for ease of tracking.

On Fri, Sep 26, 2008 at 8:28 AM, philippe van kerm
<philippe.vankerm@ceps.lu> wrote:
> This is interesting.
>
> I have the understanding that 1./2. is the 'preferred' approach, and that -program-s should be kept for 'creating new commands', not for running simple batches of commands (that being the purpose of 'do files').
>
> One argument I see against using -program-s instead of multiple/nested do-files is the risk of inadvertently redefining a command. Try for example,
>
>  pr def ml
>    di "Do Maria Lisa's tasks here... some data manipulation for example"
>    end
>
>  pr def analysis
>    ssc install dagumfit
>    sysuse auto
>    dagumfit price
>    end
>
>  ml
>  analysis
>
> This fails because -ml- is inadvertently redefining the official -ml- command which is used internally by -dagumfit-. Not only is -dagumfit- not working anymore, but -ml- is executed where it should not -- and this can be a serious issue if -ml- does something unwanted to the data, for example.
>
> This means that one would need to be careful and constantly check for name conflicts whenever strategy 3./4. is adopted. So advising strategy 1./2. (with do-files) seems  in general safer, in particular for novice users who may not notice the perils.
>
>
> But I'm be curious to read arguments for/against this claim. (The timing argument reported here is one!)
>
>
> Philippe
>
>
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
>> statalist@hsphsun2.harvard.edu] On Behalf Of Gabi Huiber
>> Sent: Thursday, September 25, 2008 10:48 PM
>> To: statalist@hsphsun2.harvard.edu
>> Subject: Re: st: RE: do-files as programs
>>
>> Martin, thank you for the pointers. For all they're worth, here are my
>> findings:
>>
>> There are four basic ways, it seems to me, to organize a project in do-
>> files:
>>
>> 1. You just make a list of instructions, executed one after another.
>> If you need any of them executed more than once, put them inside a
>> foreach loop. That's one do-file, and it will get as long and involved
>> as the project demands it. It's the way we program when we are new to
>> Stata.
>>
>> 2. You break up the problem into smaller do-files and have a master
>> file call them call them as needed.
>>
>> Neither of the above makes use of the program feature. Do-files are
>> read in as many times as they are used. 2. has certain advantages over
>> 1. in ease of debugging and general readability, as short do-files are
>> easier to pore over than long ones, but the project will have to rely
>> on multiple inter-linked do-files instead of one. My guess is that the
>> project doesn't have to be terribly complex before the advantages
>> trump this drawback.
>>
>> 3. You make a do-file organized broadly as follows: in the first
>> section you declare any programs you need, then in the second you
>> invoke them as needed.
>>
>> 4. The programs at 3. above are saved as separate do-files, and a
>> master file calls them in once with the "do file.do" command, then
>> executes them as many times as needed by invoking their name. So, both
>> 3. and 4. do make use of this "program" feature.
>>
>> My test project: I had to tabulate one dummy variable in 30 different
>> files, then save the matcells in a master matrix. To make it last a
>> little, I ran the same thing twice. I organized the project in the
>> four ways above:
>>
>> 1. One do-file with no programs in it;
>> 2. Four separate do-files organized as a master calling the other
>> three twice each;
>> 3. One do-file with three programs in it, each invoked twice and
>> finally
>> 4. Four do-files, where the master was calling in the other three
>> once, then invoking them each twice.
>>
>> The results are as follows: 3 was fastest at 6 seconds, followed by 4
>> at 9 seconds or so. 2 and 1 were about equally bad, some 11-12 seconds
>> each.
>>
>> This suggests that declaring do-files as programs increases
>> productivity. I hope this helps somebody.
>>
>> Gabi
>
> <SNIP>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index