Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: What is good programming practice in Stata?


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: What is good programming practice in Stata?
Date   Thu, 19 Nov 2009 12:01:50 -0000

Joachim asked a (good!) question but then answered it ironically and delivered observations on what is _common_ programming practice. That's fine by me but we can't have a very fruitful discussion on this with distinguishing _good_ and _common_. 

I'd say style here rather than practice. Good programming practice certainly includes good structure of programs and good strategy in planning programs and designing syntax. Joachim's focus is more on the small stuff. 

He seems to have missed one piece of pontification: 

SJ-5-4  pr0018  . . . . . . . . . . . . Suggestions on Stata programming style
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q4/05   SJ 5(4):560--566                                 (no commands)
        suggestions for good Stata programming style

-- but that doesn't touch on many of the details in his post. 

We could talk about good writing style in English, Joachim could talk about good writing style in Swedish, and we might still agree that really good style was quite rare. So sampling programs written by Stata programmers, many of whom would probably regard themselves as definitely still learners, doesn't tell you much about good style, any more than sampling prose written by university students in their native language tells you much about good style. 

Also, please, if you want to talk about Stata programming you must use Stata terminology. I know what a parameter is in statistics, but not what it is as part of any Stata language. 

I think it's pretty clear, and spelled out in every history of Stata, that the major past influence on Stata is the Unix/C culture. The major present influence on Stata I would assert to be Stata, in that most programmers are broadly imitative of what they read. I've no data, but it seems that for many Stata is their first and only serious programming language. A common question on starting Mata is: where are the local macros? 

The overarching question is how far do we write in order to be read? I can think of four answers: 

1. Programmers should want to be able to read their own code easily if only when they modify it at later dates. This I take to be obvious to all who code but by itself it implies only that you should follow personal style rules. 

2. Programmers may write in collaboration with others. In Stata, most programs are really rather short by general computing standards and the modal number of programmers per program is easily one, so this doesn't often bite. 

3. Programmers may expect to write a program but have others take it over at some later date. This is standard in many institutions but not very common -- indeed a sensitive subject -- in Stata. Death, apparent abandonment or religious conversion (e.g. the original programmer becoming an adept of some other software) would be the main reasons for taking on someone else's code.  

4. Programmers might expect to be read by users. My own haphazard sampling suggests that most users have absolutely no intention of reading code even when it would be highly instructive. Personally, I write to be used rather than read, but I am not embarrassed at the thought of being read. 

On your specifics: 

#1 Do not use either Pascal casing (MyVar) nor Camel casing (myVar) for
variables and parameters, just stick to small caps.

No rule against either, but I'd agree both are very uncommon styles in Stata. 

#2 Do not use meaningful and descriptive words to name variables

I'd agree that good names are a good idea. I don't think that's subversive. 

#3 Use as much of single character variables as you like and surely do not
comment on what they are

I'd say single character names are the norm for loop indexes and often when statistical conventions are being echoed. No statistical programmer I can imagine would regard it as obscure to call a response variable -y-. Putting -response-, -outcome- or whatever just bloats your code. But I'd often write something like -yvar- too. 

I'd agree there's a culture of commenting only sparsely in Stata. I think you'd find a chorus against over-commenting as just adding clutter. If you have to lean on the comments, you don't understand enough Stata to be able to understand the code! It may have grown out of the early days of Stata when developers could just walk a short distance and ask for explanation of tricky code. To my mind, there's an inevitable sameness about many Stata programs (syntax checking/data checking/preliminary calculation/main calculation/display of results/return saved results) that makes comment unnecessary unless you were writing code to be read in a course on Stata programming, in which you are on your best behaviour and write artificially.

#4 Do not bother to use method names that dissimilar to existing functions
(i.e., display versus Display)

I am not sure of your point here, but I'd say that case distinctions are rarely a good way to make code clear unless you have some personal style rules that you stick to it absolutely. I'll sometimes use a case distinction very briefly to do something. 

#5 Do not separate logical groups of code

It's indeed common to get long unbroken code segments. There is no good defence except that it may not matter much. 

#6 There is no consensus about numbers of blank lines between different
methods in an ado-file.

That's probably correct. I don't like more than single lines. Double lines or more just lengthen program files to no good more purpose. 

#7 Do not use single spaces before and after operators and brackets.

This is discussed in my 2005 paper. I'm a strong advocate of adding spaces for clarity but keeping lines short is another conflicting aim. 

#8 By all means use as much of abbreviations as possible

See above. 

Nick 
n.j.cox@durham.ac.uk 

Joachim Landström

I have been browsing around on Internet trying to find any suggestions about
good programming practice in Stata and have failed to do so. Thus I pose
this question.

When I have a look at ados, it does seem to me that good programming
practice in Stata amounts to:
#1 Do not use either Pascal casing (MyVar) nor Camel casing (myVar) for
variables and parameters, just stick to small caps.
#2 Do not use meaningful and descriptive words to name variables
#3 Use as much of single character variables as you like and surely do not
comment on what they are
#4 Do not bother to use method names that dissimilar to existing functions
(i.e., display versus Display)
#5 Do not separate logical groups of code
#6 There is no consensus about numbers of blank lines between different
methods in an ado-file.
#7 Do no use single spaces before and after operators and brackets.
#8 By all means use as much of abbreviations as possible
.
.
.

Well I could continue but the more I write I feel that it rather becomes a
list of bad programming practice. 

If we have a look at good programming "code of conduct" in e.g., C++ or Java
we see extensive use of different types of casing separating classes,
methods, variables and parameters. Variables are given descriptive words,
commenting is sparse and largely unnecessary since descriptive words are
used and abbreviations are avoided as are single character variables. Single
spaces are used both before and after operators and brackets. 

I could go on on this issue but being rather fresh as a Stata user my
empirical sample of ados may be biased and that is why I raise this
question.

What is Good Stata Programming Practice?


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index