Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Protecting users from their own stupidity

From   "Nick Cox" <>
To   <>
Subject   st: RE: Protecting users from their own stupidity
Date   Mon, 21 Oct 2002 20:54:07 +0100

Roger Newson
> A query about approved programming practice. In official 
> Stata, most 
> programs go out of their way to protect users from their 
> own stupidity 
> whenever there is a possibility that a user might 
> accidentally over-write 
> an existing data set in the memory. In particular, if a 
> data set is present 
> in the memory, then -use- will only input a new data set if 
> the user 
> specifies the -clear- option, and -exit- will not exit from 
> Stata unless 
> either the user specifies -clear- or the data set is 
> unchanged. A notable 
> exception is the -collapse- command, which routinely 
> destroys pre-existing 
> data sets, presumably because any user who uses -collapse- 
> is assumed 
> thereby to have consented implicitly to have the existing data set 
> destroyed. People in the Stata community who write their 
> own packages 
> usually want their programs to conform to the same high level of 
> user-friendliness as official Stata. Is there a general set 
> of rules, 
> approved by StataCorp or otherwise, regarding when programs 
> should or 
> should not routinely overwrite existing data sets in memory 
> without a 
> -clear- option? I ask because I have previously written 
> ado-files (notably 
> -parmest- and -dsconcat-) which can overwrite existing data 
> sets in memory, 
> and I have been advised (by Bill Gould) that they should do 
> more than they 
> do to protect users from their own stupidity (as -use- does 
> and -collapse- 
> doesn't).

I am not aware of a single source for this in Stata 
Corp documentation. 

What is most obvious, however, are not so much rules 
as growing awareness of the importance of this topic and the 
emergence of a variety of conventions. Of course,
Stata has long since provided a series of devices to 
stop you or inhibit you from making substantial changes
to your data by accident. 

Wearing two hats on my bald head, (1) as a user-programmer 
(2) as Executive Editor, Stata Journal, I 
endorse Roger's implication that this is an important area. 
What is more, it can be surprisingly contentious, 
especially when programmers codify what are in essence 
personal or site conventions about what is legitimate 
or sensible (including what may be legitimately or 
sensibly undocumented!) within programs which are
then circulated for wider use. 

In one recent program I saw, the user's data were
destroyed and replaced with another data set 
without absolutely _no_ indication in the help 
file or in the accompanying documentation that 
this would happen. In my view, this is totally 
unacceptable. Fortunately, the program could 
be, and was, rewritten to avoid this. 

In many other cases, something like the -sort- 
order may be changed without this being flagged. I 
have encountered programmer comments of the 
following forms, some flavour being added here, 
and some being my own attitudes when I too 
did this: 

1. "This will usually put the data in 
a more sensible order and so is really a 

2. "This never matters since any other program 
can, and indeed should, put the data in the -sort- 
order which it needs."

3. "All users who know what they are doing 
use identifiers, so at most you just 
need to -sort- to get back to an original status." 

This particular point could be debated at some 
length, but Stata Corp recently 
added the option to make programs -sortpreserve-, 
so this can always be avoided. 
More importantly, official Stata's code has
seen considerable tightening up on this point
over the last few years, although some gaps 

The following seem to be very widely 
used conventions: 

1. An option such as -replace- or -clear- 
should be specified whenever that is
the result of the action. Such options 
can never be abbreviated. However, Stata is 
not completely consistent here. -collapse- 
and -contract- don't need such an option. 
Perhaps it is thought that the purpose 
and result of such commands are obvious. 

2. A small but growing habit is the use 
of a -force- option whenever it is 
thought a good idea to underline 
to users that some violence is being 

3. -nobreak- blocks protect the
data during delicate operations. 

In the -stylerules- document on SSC I suggested 
the following guidelines. (The full context 
of that document is important for evaluating these.) 
Some may think these very severe, but 
in my own experience the more one writes 
Stata programs, the more you want to put
the responsibility for changing the data
where it belongs, on the user. 

Respect for datasets 

In general, make no change to the data unless that is 
the direct purpose of your program or that is explicitly
requested by the user.  For example,

your program should not destroy the data in memory 
unless that is essential for what it does

you should not create new permanent variables on 
the side unless notified or requested

do not use variables, matrices, scalars or global 
macros whose names might already be in use: there is
absolutely no need to guess at names unlikely to occur, 
as temporary names can always be used (see help on 
tempvar, tempname, and tempfile)

do not change the type of a variable unless requested

do not even change the sort order of data: programs can 
easily be made sortpreserve.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index