[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: tips to manage multiple do files?

From   Glenn Hoetker <[email protected]>
To   [email protected]
Subject   Re: st: RE: tips to manage multiple do files?
Date   Thu, 18 Sep 2008 11:28:47 -0500

I take a different approach, which is keep all of a projects code in one honking large do file, which is divided into sections. I pasted a skeleton version of this below. To explain what I'm trying to do in several places, since this is pretty idiosyncratic.

1. The "Header" section
(a) gives information on the file (in case I ever find a print-out of it somewhere and need to find it on the computer)

(b) lists what files are used and produced by each section. Great when I can't figure out where "important_data_file_1" comes into play. I could probably do the same just be searching, but I like this way.

2. The "MACROS" section
(a) I use the local macro "do_me" to control which sections get run for a given run of the do file. The sections are labels a-z and I just insert the letters of the section I want to run into the do_me macro (here, I'll run parts a and b). More below.

(b) defines some temp variables, files, etc. just for future reference.

3. The MAIN CODE block
(a) Since I like to run my code and then examine the aftermath, I turn more off

(b) The line

if strpos("`do_me'","a")>0 {

evaluates to some number greater than 0 if I included "a" in my definition of do_me. In that case, everything with the brackets defining section "a" will be run. If "a" isn't in do_me, this evaluates to 0 and nothing between the brackets is run. And so on and so forth. This way I can go to one point of the do file (the Macros section), see what each section does and flip them on or off for a given run.

Typically, section A is all about taking raw data and manipulating it into the data I'll use and then saving that for use by subsequent sessions. With any luck, I only have to run this section once. Section B is probably descriptive statistics, C-?? will be different modeling strategies. Then they'll be a section for producing publication-ready output.

The pluses I've found to this system are that everything is captured in one place, easy to reference, edit and run. I don't have to comment sections in and out to control what runs. There are minuses. Because the file is so long, typically, one has to watch it all scroll by on the screen. When editing, getting from one section to another can be a pain. I've solved this by always using text editors that allow for tags or bookmarks. I assign the tag "0" to the line "local do_me..."; the tag "A" to the starting line of section a, etc. Puts any part of the code within a few keystrokes.

I also make heavy use of comments throughout the code, largely because in my field (management), there may be 6 months between when I submit an article and when I see what referees want me to revise. I have to have some way of figuring out what the heck I did all that time ago!

Hopefully of some use.



----File follows----




Input files

Output files


Input files

Output files



local do_me "ab" /*Which sections to do */
// a. Description of what part "a" does
// b. Description of what part "b" does
// c.
// d.

local base
local base
tempvar tv1
tempvar tv2
tempfile tf1


set more off

/* A. Description */
if strpos("`do_me'","a")>0 {
set memory 250m
cd `base'

[Insert code here...]
save working_data, replace

/* B. Description */
if strpos("`do_me'","b")>0 {
set memory 250m
cd `base'

use working_data
[code here]

Glenn Hoetker
Associate Professor (Business, Law, Institute for Genomic Biology)
Director, Initiative on Science and Technology in the Pacific Century
Faculty Fellow, Academy for Entrepreneurial Leadership
University of Illinois
[email protected]
Personal website:
Science & Technology in the Pacific Century initiative website:

On Sep 18, 2008, at 11:06 AM, Rajesh Tharyan wrote:


Here is my take on (2).

If you are going to use stata's do editor, then it can get real messy if you
put all the notes, comments and anything else into it. Alternative would be
to use some external text editor which can be integrated with stata.
You can find more information here

I use notepad++. When I started, I relied on the built in do editor.
However, as my do files became more complicated. I found it difficult.
(Whether that complication was a good thing or not I do not know!). My main
problem was
When I had say three different ways of doing something in the middle of a do
file, my do files looked extremely complicated and I was loath to keep
several different do files. I found the block feature very useful for this.
Another was replacing something across multiple dofiles. Syntax highlighting
I found useful. But more than even a simple thing like being able to make a
font bold or italicised improved readability a lot. But, the built in editor
doesn't allow that.

This issue has come up many times here, even in the shortime that I have
been here. The built in editor is certainly useful for doing small things
like testing code say from some post on the statalist or something like
Setting up the external editors may seem like a hassle but the good people
here have made several useful pages available and it shouldn't be too
difficult to get things set up.


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Man Jia
Sent: 18 September 2008 16:10
To: [email protected]
Subject: st:tips to manage multiple do files?

Hi all,

I was wondering if anyone could share any tips of managing many do
files for one research project. Thanks for your help!
(1) How to make it easy to find some specific work in several do files?

Now I have to open most of them to see if the file has the part I'm
looking for. The thing is, it is kind of hard to remember clearly the
content of each file after even two or three days not working with
them. I tried to write outline in the beginning of each file, but I
have to open each of them to get the outline.
(2) detailed comments in do files
For me it's useful to have detailed comments to explain what I'm
doing in most parts in a do file. Those details include what the
commands are doing, anything I should be careful with in future,
sources I get the ideas, why the alternative ways are not good, summary
of the results..... But a do file has limitation of number of lines in
it. So, Writing a lot of comments means I have to create many do files.

So, is there any good habit to deal with this? Is it better that in do
files just add concise comments and take notes in other files?
Thanks for your time! I really appreciate it.
* For searches and help try:

* For searches and help try:
*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index