Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Dealing with string variables

From   Ron´┐Żn Conroy <[email protected]>
To   [email protected]
Subject   Re: st: RE: Dealing with string variables
Date   Wed, 10 Nov 2004 16:45:28 +0000

Nick Cox wrote:

drop if grain == "wheat"
value labels for string variables: not allowed
what you want is better done by
gen Crop = "crop" if inlist(crop, "wheat", "barley") replace Crop = "root" if inlist(crop, "potato", "cassava", "yams", "beet") ...

using -encode-

. label define crop 1 "wheat" 2 "barley" 3 "root", modify
. encode crop, gen(Crop) label(crop)
. replace Crop=3 if inlist(crop, "potato", "cassava", "yams", "beet")
. replace Crop=. if Crop > 3

1. start by defining a label that will assign numeric codes to the strings in a logical order. In a do-file, always follow this by -modify- in case the label is already defined in some way.
2. use -encode- to generate a new variable, using the predefined mappings. Strings that are not found in the predefined value label will be added to the value label, using unassigned numbers. For this reason, nothing will be assigned the number 3 because we reserved this for a new category, "root".
3. Nick's nice piece of code that checks for root crops. You could use the -index- function for a single string, but -inlist- is more general. -index- has some neat uses; this isn't one of them.
4. Any remaining values are ones that we don't need (strings like "not applicable" or other crops that don't interest us). These will have been assigned value labels and numeric codes that can be inspected easily by typing

. lab list crop


Ronan M Conroy ([email protected]) Senior Lecturer in Biostatistics Royal College of Surgeons Dublin 2, Ireland +353 1 402 2431 (fax 2764) -------------------- Just say no to drug reps

* For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index