[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Stata 11 data format

From	"Lachenbruch, Peter" <[email protected]>
To	<[email protected]>
Subject	RE: st: Stata 11 data format
Date	Wed, 1 Jul 2009 08:38:26 -0700
Amen to that.  I've had problems with "Survey Monkey" which converts
everything to an Excel file and plugs in the text of the question as
line 1 and some other information (the variable name often appears) on
line 2.  It also often gives a column to each possible answer (thus
Marital status has 4 or 5 columns where one would do).

Many of my "clients" do this, so there is a yes/no/not applicable/no
answer response in 4 variables when 1 will do.  I try to explain to
people that it's wasteful and a pain in the neck, and if I have to
convert these it will result in my spending 20 or more hours in doing
this (and I bill for the hours).  It's good to try to prevent this, but
when you get the data set already completed (or aren't aware of Survey
Monkey's nasty habit) there's not much to do.

Tony

Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Joseph
Coveney
Sent: Tuesday, June 30, 2009 7:42 PM
To: [email protected]
Subject: RE: st: Stata 11 data format

Nick Cox wrote:

There are issues here at a variety of levels. One of the simplest is
that by default variable labels are used for display in several commands
for graphs and tables. Thus allowing what Markus wants could be only
accompanied by truncated variable label display in many contexts. 

It seems to me that the main effort might well be focused on writing
preprocessors to turn long variable labels from other packages' files to
Stata notes, but I'm not volunteering. 

Nick 
[email protected] 

Markus Hahn

Alan wrote:
> Variable and dataset labels still have a maximum length of 80
characters.
> I am not sure what Markus wants to put in the labels that is longer
> than 80 characters, but Stata's ability to put -notes- on variables
> and the dataset as a whole are what I would recommend.  Individual
> notes may be up to 67,784 characters long, and each variable and
> the dataset as a whole may have up to 9,999 notes.

As far I know, other packages such as Spss (or whatever it is called
now) do support longer labels for variables. The problem I see with the
limitation of 80 characters is that some data providers do not provide
native Stata data files. Converting data files, let's say from Spss
format to Stata format, could lead to truncated variable labels if the
Spss labels are longer than 80 characters. What's so annoying about this
is that sometimes the most interesting part of the label is at the end
and it is the end at which variable labels get truncated. I understand
that this is not Stata's problem per se. It may be the fault of the data
providers that create variable labels that are too long but still these
longer labels could contain valuable information. I don't see a reason
for Stata not having longer variable labels while value labels support
strings as long as 32,000 characters. If the problem is that Stata's
data format would not easily support longer variable labels (due to
performance or memory issues?), why not just save variable labels like
characteristics?
 
------------------------------------------------------------------------
--------

I've had experience with variable label truncation when converting SAS
datasets
(256-character limit, I believe) to Stata datasets.  In all but one case
that I
can recall, the sender was putting value label information into the
variable
label, for example (fictional, for illustration), for a variable named
CERVESS
the label would be 'Cerebral Artery -- 1 = L ICA 2 = R ICA 3 = L MCA Seg
I . .
.'.  Often, the length results from including the kind of metadata that
doesn't
really belong in a variable label, and wouldn't normally be put there
except out
of habit or concern that the value labels would be somehow separated
from the
dataset in transit.

In the one exception, the variable labels contained the "question" text
from the
data-collection forms (the text of the items as shown on the survey
instrument
or questionnaire).  I had Stat/Transfer convert the several dozens of
SAS
datasets to SAS programs + ASCII data files, and used the preprocessor
approach
that Nick mentions.  It's not a major chore to prepare a do-file that
-infile-s
the resulting SAS programs into Stata as a string datasets and parses
the LABEL
sections into do-files, directing the variable labels into -notes-
associated
with the corresponding Stata variables.  This was in the days before
Mata, and I
cut the text streams into 144-character chunks when bringing them in,
and
re-assembled them into the -note-s via local macro variables.

Joseph Coveney



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Prev by Date: st: modeling contract duration
Next by Date: st: AW: graph combine question
Previous by thread: RE: st: Stata 11 data format
Next by thread: st: modeling contract duration
Index(es):
- Date
- Thread