[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: value labels for string variables
--On Sunday, October 27, 2002 2:33 -0500 Richard wrote:
I don't see the issue here. Say that you have a million records, and one
string variable recorded therein is str2 state, AK..WY [DC PR]. You do not
want to store the 'long name' of the state in the database, so you set up a
new dataset with 50, 51 or 52 cases, containing str2 state. You encode
state into int statename, and you define a value label containing 50, 51 or
52 values for that integer variable, containing the long names of states.
SAS, SPSS. and S-Plus allow value labels for string variables. Also they
allow the development of value labels independent of the database being
value labeled. (Proc Format)
STATA does not, at least not without some (considerable) rigamorrole.
Maybe Stata people will fix this.
At present the soltuion (that is quickest) seems to be developing a new
variable using the valkue label as a variable value. This is not
database-wise efficient. And these labels are not easily reduced to short
strings; subtle disease distinctions are difficult to reduce to a few
If you then merge this dataset with your million-record dataset on state,
statename will contain the 'value labels' of state, but will not store them
as strings; it will store the integer value underlying. The overhead
associated with this strategy is merely one integer per case (in the case
of states, I could use a byte data type; in general an int will suffice).
The following dataset appears to have long names for 'statename'; in
reality it is an integer with a value label.
state var2 statename
1. MA 222 Massachusetts
2. MA 999 Massachusetts
3. MA 111 Massachusetts
4. ME 888 Maine
5. ME 333 Maine
6. NH 444 New Hampshire
7. NH 777 New Hampshire
8. VT 666 Vermont
9. VT 555 Vermont
It does not seem to me that this strategy is onerous, and it is only
limited by the existing limits on labels. I do not imagine that the
overhead involved can be improved upon in other packages' implementation of
this feature; if you want to associate one of ~32K or ~64K string values
with a case, you need an appropriately sized integer.
* For searches and help try: