Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Convention RE ado's which may redefine value labels


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Convention RE ado's which may redefine value labels
Date   Tue, 19 Nov 2002 18:27:46 -0000

Joly.Patrick@ic.gc.ca
>
> I am writing an egen function which, among other things,
> assigns a new value
> label to the newly generated variable.  I am concerned that
> a label with the
> same name may already be defined and wish that existing
> labels be protected.
> Several strategies could be used to get around this such as
> checking whether
> a particular value label exists, combined with either:
>
>    i)   exiting the routine while informing
>           the user that this particular label
>           is already defined;
>    ii)  request that the user specify a unique
>           label name via a labname() option; or
>    iii) generate a unique name using a -tempvar
>           labname- call.
>
> That said, I am more concerned with "good coding practice"
> and consistency
> than with the specifics of a solution for my routine.
>
> The programming conventions as discussed on Statalist, or implied by
> official Stata routines, provide clear guidance with respect to new
> variables and new data files.  For new variables, most
> routines use -capture
> confirm ...- early in the code to verify whether a variable
> already exists.
> For new data files, users are usually given the option to
> specify -replace-,
> to clarify that it is OK to overwrite data, otherwise, nothing is
> overwritten.
>
> I am not aware of a similar convention for value labels.  I
> looked at the
> behaviour of -encode- to see how Stata would behave if I generated a
> variable with a name identical to a previously defined
> label (NB: -encode-
> uses the new variable name for the label name).  It turns
> out that -encode-
> overwrites a existing labels of the same name.  On the
> other hand, any
> -label define <labname>- statement is met with an error
> message if <labname>
> is already defined, unless option -, modify- is specified.  See my
> postscript below for an example.
>
> One may point to the fact that Stata lets users overwrite
> scalars and
> matrices at will to suggest that labels do not deserve any
> particular
> protection.  However, a fundamental difference between labels and
> scalars/matrices is that the latter are not saved with the
> data.  Hence,
> overwriting labels _could_ be viewed as a modification to
> the data.  I wrote
> _could_ since -describe- does not consider changes to value
> labels as being
> changes to the data. Consequently, users may exit Stata
> without a warning to
> the effect that the data has not been saved.
>
> At any rate, I am not hinting that -encode- should behave
> one way or another
> -- for now, I am taking its behaviour as a given.  I am
> just wondering if
> there exists a preferred coding practice to prevent
> existing label from
> being overwritten and, possibly, seek a justification as to
> why changes to
> labels are not deemed to be changes to the data.
>
> P.S.:
>
> . * -encode- does not warn the user
> . * when a label is already defined
> . u c:\stata\auto, clear
> (1978 Automobile Data)
>
> . la def newvar 1 "a label" 2 "another label"
>
> . encode make, gen(newvar)
>
> . desc
>
> Contains data from c:\stata\auto.dta
>   obs:            74                          1978 Automobile Data
>  vars:            13                          7 Jul 2000 13:51
>  size:         3,774 (100.0% of memory free)
> ------------------------------------------------------------
> ----------------
> ---
>               storage  display     value
> variable name   type   format      label      variable label
> ------------------------------------------------------------
> ----------------
> ---
> make            str18  %-18s                  Make and Model
> price           int    %8.0gc                 Price
> mpg             int    %8.0g                  Mileage (mpg)
> rep78           int    %8.0g                  Repair Record 1978
> headroom        float  %6.1f                  Headroom (in.)
> trunk           int    %8.0g                  Trunk space (cu. ft.)
> weight          int    %8.0gc                 Weight (lbs.)
> length          int    %8.0g                  Length (in.)
> turn            int    %8.0g                  Turn Circle (ft.)
> displacement    int    %8.0g                  Displacement (cu. in.)
> gear_ratio      float  %6.2f                  Gear Ratio
> foreign         byte   %8.0g       origin     Car type
> newvar          long   %17.0g      newvar     Make and Model
> ------------------------------------------------------------
> ----------------
> ---
> Sorted by:  foreign
>      Note:  dataset has changed since last saved
>
> . la list newvar
> newvar:
>            1 a label
>            2 another label
>            3 AMC Concord
>            4 AMC Pacer
>            5 AMC Spirit
> <snip>
>           75 VW Scirocco
>           76 Volvo 260
>
> .
> . * But -label define ...- has a safety feature
> . * (i.e. option modify) preventing a user from
> . * overwriting a label
> . u c:\stata\auto, clear
> (1978 Automobile Data)
>
> . encode make, gen(newvar)
>
> . la def newvar 1 "a label" 2 "another label"
> label newvar already defined
> r(110);
>
> end of do-file
> r(110);
>

I almost always tackle this by using
a command like

. tempname lblname

(which I think is what Patrick means
by his reference to -tempvar-).

The labels defined with
a tempname will be -save-d
with the data so long as they have been
attached to a variable, which is what
we are talking about.

I wasn't aware of -encode-'s power
to overwrite labels without authorisation.
As Patrick says, there is a range from
what is protected and cannot be changed
without explicit command to what is deemed
transient and trivial. (And, over time,
Stata has been tightening up on this.)
Although value labels are somewhere in between,
in many instances the overwriting of a set of value labels could
have major implications for data management
and analysis. On these grounds I would suggest
that this feature is at best a misfeature.

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index