Why doesn’t the destring command in Stata include an encode option?
|
Title
|
|
The destring command
|
|
Author
|
Nicholas J. Cox, Durham University, UK
|
|
Date
|
April 2001; updated April 2005
|
When
destring was incorporated into Stata 7, it was largely rewritten.
(In some ways, the original destring command violated Stata’s
philosophy because it was too easy to change much of your dataset without the
safeguard of having to spell out some injunction such as , replace.)
As a point of principle, it was decided to sharpen the distinction between
destring and encode, more or less along the following lines:
-
encode is designed for situations in which you have a string
variable, typically containing meaningful nonnumeric text (e.g., male,
female), and wish to have the equivalent information as a numeric
variable with labels. This goes way back in Stata history.
-
destring is designed for situations in which you have a string
variable, typically containing meaningful numeric text (e.g., 1,
2), which you wish to convert to the numeric variable it should
properly be. Usually, that variable is now string because of some
mistake. Perhaps the mistake was yours, because what you initially
typed in Stata's Data Editor was nonnumeric. Or, perhaps the numeric
text is contaminated by nonnumeric text from some earlier
operation (e.g., in a spreadsheet), and Stata spotted that.
(In fact, destring was written largely because people new to Stata
had problems whenever they wrote descriptive text in the first row of the
Data Editor, spreadsheet style. Using its “first impressions
count” rule, Stata decided that any such column contained a string
variable. Being able to escape this predicament with a single
destring was often comforting to these users.)
The middle ground between these problems is realizing that a string variable
with male and female is often not of much use for data analysis in Stata, and
that you only want the numeric equivalent (with value labels).
destring itself has an STB history
going back to 1997, and it may be that some users got used to the fact that
it could encode en masse (and that, to these users, throwing away the
original string variables was fine).
What do we suggest for anyone using Stata 7 or newer in this position?
Our best advice is that you can encode several variables at once
using foreach, perhaps along the lines of the following code:
foreach v of var varlist {
encode `v', gen(E`v')
}
If you want to drop the mistaken string variables and use the
original names, type
foreach v of var varlist {
drop `v'
rename E`v' `v'
}
|
FAQs
What's new?
Statistics
Data management
Graphics
Programming Stata
Mata
Resources
Internet capabilities
Stata for Windows
Stata for Unix
Stata for Mac
Technical support
|