Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: suggestion for Stata 8: value labels for string variables

From   "HealthMaps" <>
To   <>
Subject   st: RE: RE: suggestion for Stata 8: value labels for string variables
Date   Fri, 25 Oct 2002 09:45:22 -0700

I have string variables with codes (1 million records - the database works
very well in Stata (100+ MB)

CAUSE (a string variable)            AGE  ... more variables
A391                                 71
A483                                 32
A985                                 45
B222                                 85
B230                                 91
D469                                 56
D593                                 74
D65'                                 44
D762                                 58
D814                                 65
D820                                 69
D821                                 72
D824                                 85

a sample of the codes are: (taken from a Proc Format
statement in SAS) There are thousands of them. And in any one database there
will be a subset of these codes. They are shortened about as much as they
can be, there are subtle distinctions in some codes for closely related

'A391'='Waterhouse Friderichsen syndrome'
'A483'='Toxic shock syndrome'
'A985'='Hemorrhagic fever w renal syndrome'
'B222'='HIV dis wasting syndrome'
'B230'='Act HIV infect syndrome'
'D469'='Myelodysplastic syndrome, unspec'
'D593'='Hemolytic uremic syndrome'
'D65' ='Diseminated intravascular coagulation [defibrination syndrome]'
'D762'='Hemophagocytic syndrome, infect assoc'
'D814'='Nezelofs syndrome'
'D820'='Wiskott Aldrich syndrome'
'D821'='Di Georges syndrome'
'D824'='Hyperimmunoglobulin E [IgE] syndrome'

Then when I want to do a frequency table or most anything else where the
code will be displayed I want to see the value label for the code instead

I do not want to see

CAUSE     frequencies or whatever

on my output but:
 CAUSE                                      frequencies or whatever
 Waterhouse Friderichsen syndrome
 Toxic shock syndrome
 Hemorrhagic fever w renal syndrome
 HIV dis wasting syndrome
 Act HIV infect syndrome
 Myelodysplastic syndrome, unspec
 Hemolytic uremic syndrome
 Diseminated intravascular coagulation [defibrination syndrome]
 Hemophagocytic syndrome, infect assoc
 Nezelofs syndrome
 Wiskott Aldrich syndrome
 Di Georges syndrome
 Hyperimmunoglobulin E [IgE] syndrome

Clearly this is an extreme case, but it comes up with Zip Codes or other
geographic identifiers that appear as strings, often with periods in them
(like US census tracts) from a GIS. Can't use these as numbers and I need to
link value labels to them (like the name of the Zip Code)

Richard Hoskins

-----Original Message-----
[]On Behalf Of Nick Cox
Sent: Friday, October 25, 2002 9:15 AM
Subject: st: RE: suggestion for Stata 8: value labels for string

> I asked this list about the possibility of labeling string
> variables, like
> ICD9 or 10 codes, census tract identifiers, with value
> labels. An ominous
> silence.
> I asked tech support who did everything possible to help me
> out suggesting
> work-arounds. In the end, Stata does not allow value
> labeling of string
> variables. (i hope i am wrong...) Tech support has very nice people.
> Unfortunately all my data is loaded with string variables
> for codes for
> various diseases, hospital procedures, geographic codes,
> etc. and to put
> those labels as part of the database would significantly enlarge the
> database. I tried the encode route but this means that
> every database I have
> with the same set of codes has a different set of encoded values.
> So I suggest that the capacity to value label string
> variables be added to
> Stata 8. Something that one can do in SAS, SPSS, and S-Plus.

You can't attach value labels to string variables
in Stata. You can of course attach value labels
to numeric variables.

However, I don't understand what is wanted here.
I'd like to know what tasks are made difficult or impossible
because you can't have this.

I see string values as being their own
value labels. If you want as it were short
and long versions of some string variable,
then they can be put in two related string

I can see issues in terms of

1. storage overhead

2. smart ways of abbreviating long
to short

but please spell out why you want this.


*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index