Home  /  Resources & support  /  FAQs  /  The destring command

Why doesn’t the destring command in Stata include an encode option?

Title   The destring command
Author Nicholas J. Cox, Durham University, UK

When destring was incorporated into Stata 7, it was largely rewritten. (In some ways, the original destring command violated Stata’s philosophy because it was too easy to change much of your dataset without the safeguard of having to spell out some injunction such as , replace.)

As a point of principle, it was decided to sharpen the distinction between destring and encode, more or less along the following lines:

  1. encode is designed for situations in which you have a string variable, typically containing meaningful nonnumeric text (e.g., male, female), and wish to have the equivalent information as a numeric variable with labels. This goes way back in Stata history.
  2. destring is designed for situations in which you have a string variable, typically containing meaningful numeric text (e.g., 1, 2), which you wish to convert to the numeric variable it should properly be. Usually, that variable is now string because of some mistake. Perhaps the mistake was yours, because what you initially typed in Stata's Data Editor was nonnumeric. Or, perhaps the numeric text is contaminated by nonnumeric text from some earlier operation (e.g., in a spreadsheet), and Stata spotted that.

(In fact, destring was written largely because people new to Stata had problems whenever they wrote descriptive text in the first row of the Data Editor, spreadsheet style. Using its “first impressions count” rule, Stata decided that any such column contained a string variable. Being able to escape this predicament with a single destring was often comforting to these users.)

The middle ground between these problems is realizing that a string variable with male and female is often not of much use for data analysis in Stata, and that you only want the numeric equivalent (with value labels).

destring itself has an STB history going back to 1997, and it may be that some users got used to the fact that it could encode en masse (and that, to these users, throwing away the original string variables was fine).

What do we suggest for anyone using Stata 7 or newer in this position?

Our best advice is that you can encode several variables at once using foreach, perhaps along the lines of the following code:

        foreach v of var varlist {
        	encode `v', gen(E`v')
        }

Alternatively, consider the community-contributed program multencode (SSC).

If you want to drop the mistaken string variables and use the original names, type

        foreach v of var varlist {
        	drop `v' 
        	rename E`v' `v' 
        }