
Title | Encoding a string variable | |
Author | James Hardin, StataCorp |
The most common cause of this error message is that you are trying to use a string variable with a command that only supports numeric variables. You can only tell the type of a variable by using the describe command.
This is easy to fix.
If you have a string variable and want to convert it to a numeric variable, you can use the encode command. If you have a string variable that has only numbers in it, then you can alternatively use the real() function.
. describe Contains data obs: 4 vars: 2 size: 48 ------------------------------------------------------------------------ storage display value variable name type format label variable label ------------------------------------------------------------------------ a str4 %9s b str4 %9s ------------------------------------------------------------------------ Sorted by: Note: dataset has changed since last saved . list +-------+ | a b | |-------| 1. | 1 a | 2. | 2 b | 3. | 3 c | 4. | 4 d | +-------+ . gen na = real(a) . encode b, gen(nb) . describe Contains data obs: 4 vars: 4 size: 80 ------------------------------------------------------------------------ storage display value variable name type format label variable label ------------------------------------------------------------------------ a str4 %9s b str4 %9s na float %9.0g nb long %8.0g nb ------------------------------------------------------------------------ Sorted by: Note: dataset has changed since last saved . list +-----------------+ | a b na nb | |-----------------| 1. | 1 a 1 a | 2. | 2 b 2 b | 3. | 3 c 3 c | 4. | 4 d 4 d | +-----------------+
Although nb is a numeric variable, it looks like a string variable because the encode command added value labels to it.
. list nb, nolab +----+ | nb | |----| 1. | 1 | 2. | 2 | 3. | 3 | 4. | 4 | +----+
Warning:
If you have more than 67,784 unique values of the string variables that you are encoding,
encode will complain.
If that is the case, then you can use
. egen nb = group(b)which will generate a numeric variable nb that does not have value labels. |