Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: New version of the -sencode- package on SSC


From   Roger Newson <roger.newson@kcl.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   st: New version of the -sencode- package on SSC
Date   Thu, 20 Feb 2003 19:45:47 +0000

Dear All

Thanks to Kit Baum, a new version of the -sencode- package is now available for download on SSC. Type -ssc describe sencode- or -net search sencode- to find out more.

-sencode- is a sequential version of -encode-. It encodes a string variable to a numeric variable, allocating numeric values to string values in order of appearance of the string value in the data set, instead of allocating them in alphabetic order of string value as -encode- does. The new version is still Stata 7, but has a new option -manyto1-, which allows the mapping from encoded numeric values to unencoded string values to be many-to-one. If -manyto1- is specified, then -sencode- generates a numeric variable with 1 value per observation, equal in each observation to the position of that observation in the data set, and value labels mapping those values to the corresponding original string values in the input string variable. In default, if -manyto1- is not specified, then -sencode- generates a numeric variable with 1 value per string value in the input string variable, and the value in each observation is the sequential order of first appearance of the corresponding string value in the data set. For instance, in the -auto- data, we might create a string variable -nation-, containing a car's country of origin, with values "France", "Germany", "Italy", "Japan", "Sweden" and "US". If we then type

sencode nation,gene(nationality)

then -sencode- will generate a new variable -nationality-, equal to 1 for "US", 2 for "Germany", 3 for "Japan", 4 for "Italy", 5 for "France", and 6 for "Sweden", with appropriate value labels. However, if, instead, we type

sencode nation,gene(nationality) many

then -sencode- will create a new variable -nationality- with values 1-74 (like _n), and a set of value labels mapping integers 1-52 to "US", 53-55 to "Germany", 56-59 to "Japan", 60 to "Italy", 61-63 to "Japan", 64-65 to "France", 66-69 to "Japan", 70-73 to "Germany", and 74 to "Sweden".

(Note that -sencode-, like -encode-, does not modify existing value labels. Therefore, if you try typing both the above commands in the same session, then it is necessary to drop the old value label before creating the new one. If you type

sencode nation,gene(nationality)
lab list nationality
lab drop nationality
sencode nation,gene(nationality) many
lab list nationality

then -sencode- should work as specified.)

Best wishes

Roger

--
Roger Newson
Lecturer in Medical Statistics
Department of Public Health Sciences
King's College London
5th Floor, Capital House
42 Weston Street
London SE1 3QD
United Kingdom

Tel: 020 7848 6648 International +44 20 7848 6648
Fax: 020 7848 6620 International +44 20 7848 6620
or 020 7848 6605 International +44 20 7848 6605
Email: roger.newson@kcl.ac.uk

Opinions expressed are those of the author, not the institution.

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index