Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: value labels for string variables


From   baum <baum@bc.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: Re: value labels for string variables
Date   Sun, 27 Oct 2002 10:51:50 -0500

--On Sunday, October 27, 2002 2:33 -0500 Richard wrote:

SAS, SPSS. and S-Plus allow value labels for string variables. Also they
allow the development of value labels independent of the database being
value labeled. (Proc Format)

STATA does not, at least not without some (considerable) rigamorrole.

Maybe Stata people will fix this.

At present the soltuion (that is quickest)  seems to be  developing a new
variable using the valkue label as a variable value. This is not
database-wise efficient. And these labels are not easily reduced to short
strings; subtle disease distinctions are difficult to reduce to a few
characters.
I don't see the issue here. Say that you have a million records, and one string variable recorded therein is str2 state, AK..WY [DC PR]. You do not want to store the 'long name' of the state in the database, so you set up a new dataset with 50, 51 or 52 cases, containing str2 state. You encode state into int statename, and you define a value label containing 50, 51 or 52 values for that integer variable, containing the long names of states.

If you then merge this dataset with your million-record dataset on state, statename will contain the 'value labels' of state, but will not store them as strings; it will store the integer value underlying. The overhead associated with this strategy is merely one integer per case (in the case of states, I could use a byte data type; in general an int will suffice). The following dataset appears to have long names for 'statename'; in reality it is an integer with a value label.

state var2 statename
1. MA 222 Massachusetts
2. MA 999 Massachusetts
3. MA 111 Massachusetts
4. ME 888 Maine
5. ME 333 Maine
6. NH 444 New Hampshire
7. NH 777 New Hampshire
8. VT 666 Vermont
9. VT 555 Vermont

It does not seem to me that this strategy is onerous, and it is only limited by the existing limits on labels. I do not imagine that the overhead involved can be improved upon in other packages' implementation of this feature; if you want to associate one of ~32K or ~64K string values with a case, you need an appropriately sized integer.

Kit

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index