Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Convention RE ado's which may redefine value labels

From   "David Moore" <>
Subject   st: RE: Convention RE ado's which may redefine value labels
Date   Tue, 19 Nov 2002 10:33:14 -0800

These are very good points.  In support of this idea, I note that many users
rely on value labels to document their data.  If these get clobbered
inadvertently, then so does the documentation.  Of course, I don't mean to
suggest that value labels can replace proper documentation, but even casual
reliance on value labels to define coded variables should be able to assume
that assigned labels will only change by intention.

-----Original Message-----
[]On Behalf Of
Sent: Tuesday, November 19, 2002 10:20 AM
Subject: st: Convention RE ado's which may redefine value labels

I am writing an egen function which, among other things, assigns a new value
label to the newly generated variable.  I am concerned that a label with the
same name may already be defined and wish that existing labels be protected.
Several strategies could be used to get around this such as checking whether
a particular value label exists, combined with either:

   i)   exiting the routine while informing
          the user that this particular label
          is already defined;
   ii)  request that the user specify a unique
          label name via a labname() option; or
   iii) generate a unique name using a -tempvar
          labname- call.

That said, I am more concerned with "good coding practice" and consistency
than with the specifics of a solution for my routine.

The programming conventions as discussed on Statalist, or implied by
official Stata routines, provide clear guidance with respect to new
variables and new data files.  For new variables, most routines use -capture
confirm ...- early in the code to verify whether a variable already exists.
For new data files, users are usually given the option to specify -replace-,
to clarify that it is OK to overwrite data, otherwise, nothing is

I am not aware of a similar convention for value labels.  I looked at the
behaviour of -encode- to see how Stata would behave if I generated a
variable with a name identical to a previously defined label (NB: -encode-
uses the new variable name for the label name).  It turns out that -encode-
overwrites a existing labels of the same name.  On the other hand, any
-label define <labname>- statement is met with an error message if <labname>
is already defined, unless option -, modify- is specified.  See my
postscript below for an example.

One may point to the fact that Stata lets users overwrite scalars and
matrices at will to suggest that labels do not deserve any particular
protection.  However, a fundamental difference between labels and
scalars/matrices is that the latter are not saved with the data.  Hence,
overwriting labels _could_ be viewed as a modification to the data.  I wrote
_could_ since -describe- does not consider changes to value labels as being
changes to the data. Consequently, users may exit Stata without a warning to
the effect that the data has not been saved.

At any rate, I am not hinting that -encode- should behave one way or another
-- for now, I am taking its behaviour as a given.  I am just wondering if
there exists a preferred coding practice to prevent existing label from
being overwritten and, possibly, seek a justification as to why changes to
labels are not deemed to be changes to the data.

Patrick Joly


. * -encode- does not warn the user
. * when a label is already defined
. u c:\stata\auto, clear
(1978 Automobile Data)

. la def newvar 1 "a label" 2 "another label"

. encode make, gen(newvar)

. desc

Contains data from c:\stata\auto.dta
  obs:            74                          1978 Automobile Data
 vars:            13                          7 Jul 2000 13:51
 size:         3,774 (100.0% of memory free)
              storage  display     value
variable name   type   format      label      variable label
make            str18  %-18s                  Make and Model
price           int    %8.0gc                 Price
mpg             int    %8.0g                  Mileage (mpg)
rep78           int    %8.0g                  Repair Record 1978
headroom        float  %6.1f                  Headroom (in.)
trunk           int    %8.0g                  Trunk space (cu. ft.)
weight          int    %8.0gc                 Weight (lbs.)
length          int    %8.0g                  Length (in.)
turn            int    %8.0g                  Turn Circle (ft.)
displacement    int    %8.0g                  Displacement (cu. in.)
gear_ratio      float  %6.2f                  Gear Ratio
foreign         byte   %8.0g       origin     Car type
newvar          long   %17.0g      newvar     Make and Model
Sorted by:  foreign
     Note:  dataset has changed since last saved

. la list newvar
           1 a label
           2 another label
           3 AMC Concord
           4 AMC Pacer
           5 AMC Spirit
          75 VW Scirocco
          76 Volvo 260

. * But -label define ...- has a safety feature
. * (i.e. option modify) preventing a user from
. * overwriting a label
. u c:\stata\auto, clear
(1978 Automobile Data)

. encode make, gen(newvar)

. la def newvar 1 "a label" 2 "another label"
label newvar already defined

end of do-file

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index