[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Stata 11 data format
"Nick Cox" <firstname.lastname@example.org>
RE: st: Stata 11 data format
Tue, 30 Jun 2009 17:37:27 +0100
There are issues here at a variety of levels. One of the simplest is
that by default variable labels are used for display in several commands
for graphs and tables. Thus allowing what Markus wants could be only
accompanied by truncated variable label display in many contexts.
It seems to me that the main effort might well be focused on writing
preprocessors to turn long variable labels from other packages' files to
Stata notes, but I'm not volunteering.
> Variable and dataset labels still have a maximum length of 80
> I am not sure what Markus wants to put in the labels that is longer
> than 80 characters, but Stata's ability to put -notes- on variables
> and the dataset as a whole are what I would recommend. Individual
> notes may be up to 67,784 characters long, and each variable and
> the dataset as a whole may have up to 9,999 notes.
As far I know, other packages such as Spss (or whatever it is called
now) do support longer labels for variables. The problem I see with the
limitation of 80 characters is that some data providers do not provide
native Stata data files. Converting data files, let's say from Spss
format to Stata format, could lead to truncated variable labels if the
Spss labels are longer than 80 characters. What's so annoying about this
is that sometimes the most interesting part of the label is at the end
and it is the end at which variable labels get truncated. I understand
that this is not Stata's problem per se. It may be the fault of the data
providers that create variable labels that are too long but still these
longer labels could contain valuable information. I don't see a reason
for Stata not having longer variable labels while value labels support
strings as long as 32,000 characters. If the problem is that Stata's
data format would not easily support longer variable labels (due to
performance or memory issues?), why not just save variable labels like
I am aware of possibility to use notes and characteristics and right now
this is the only way to conserve the information stored in longer
variable labels. It is just that variable labels are easier to access
than notes. This might have changed with the new variable/data editor? I
am thinking about persons new to Stata that may find themselves puzzled
if variable labels are truncated.
Philip Ryan (email@example.com) and Markus Hahn
(firstname.lastname@example.org) asked about Stata 11's .dta format:
> Are there any differences in the .dta files produced by Stata 11
> with those produced by Stata 10? (which is to say, under Stata 11,
> need to use -saveold- or a new version of Stat/Transfer to maintain
> compatibility with Stata 10?)
The Stata 11 .dta format is identical to the Stata 10 .dta format.
-saveold-, in both Stata 11 and Stata 10, saves data in Stata 9's .dta
> What interests me personally is whether variable/dataset labels can
> consist of more than 80 characters. In my view, this is a huge
> limitation and should be changed.
Variable and dataset labels still have a maximum length of 80
I am not sure what Markus wants to put in the labels that is longer
than 80 characters, but Stata's ability to put -notes- on variables
and the dataset as a whole are what I would recommend. Individual
notes may be up to 67,784 characters long, and each variable and
the dataset as a whole may have up to 9,999 notes.
* For searches and help try: