Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Re: value labels for string variables

From   "Nick Cox" <>
To   <>
Subject   RE: st: Re: value labels for string variables
Date   Sun, 27 Oct 2002 17:02:34 -0000

Erik Ø. Sørensen
> The issue is only one of convenience. Sometimes, in particular for
> interactive use and exploration, it would have been
> convenient to have
> string value labels. Convenience is not a minor issue -- if
> it were, I
> would have chosen to do all statistical programming in C
> rather than Stata.
> My work is on yearly files 1986--2000 with on average about 4E6
> individuals in each file. A group of people use these files, and I
> should not change them on disk merely because it is convenient for
> /me/. Constantly merging in other files is a nuisance for
> interactive
> use (dealing with these data is slow enough as it is). Most people
> probably work on smaller data sets where keeping local modified
> versions is sensible, but I think enough people have data
> of this size
> to make it worthwile to modify Stata itself if it would not
> slow Stata down a lot.
> It is not my intent to be difficult about this, if someone
> can come up
> with a good reason why Stata should /not/ have string labels, I am
> willing to modify my opinion. That a workaround is possible
> is not in itself a good reason.

I see various points emerging from this thread:

1. A question for Stata Corp: why have
string labels not been implemented
for string variables? I don't know why,
but I guess because there was little
apparent demand and because existing
functionality was thought to cover
this, with an easy trick or two.

2. Richard and Erik report that
with large data files the overhead
of an extra string variable can be
substantial. I can believe it.

3. Kit Baum, Nick Winter and others have
explained that with an -encode- and a -merge-,
you can be there where you want to be.

Now even if labels could be
attached to a string variable,
the last thing you want to do, in
examples like ICD-9, is to sit there
typing out short and long definitions
as a series of elements in a -label define-.
The mapping would be best set up automatically from
a file containing just the definitions,
and then you would -merge-, or use
a do file containing the -label-
definitions. This would seem to me to be essentially
parallel to what is being suggested
now, which of course need not wait
for Stata Corp to change Stata.

What is more, you clearly need to
do this only once for a given dataset.


*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index