Stata 15 help for recode

[D] recode -- Recode categorical variables

Syntax

Basic syntax recode varlist (rule) [(rule) ...] [, generate(newvar)]

Full syntax recode varlist (erule) [(erule) ...] [if] [in] [, options]

where the most common forms for rule are

+----------------------------------------------------------+ | rule | Example | Meaning | |----------------+-------------+---------------------------| | # = # | 3 = 1 | 3 recoded to 1 | | # # = # | 2 . = 9 | 2 and . recoded to 9 | | #/# = # | 1/5 = 4 | 1 through 5 recoded to 4 | | nonmissing = # | nonmiss = 8 | all other nonmissing to 8 | | missing = # | miss = 9 | all other missings to 9 | +----------------------------------------------------------+

where erule has the form

element [element ...] = el ["label"]

nonmissing = el ["label"]

missing = el ["label"]

else | * = el ["label"]

element has the form

el | el/el

and el is

# | min | max

The keyword rules missing, nonmissing, and else must be the last rules specified. else may not be combined with missing or nonmissing.

options Description ------------------------------------------------------------------------- Options generate(newvar) generate newvar containing transformed variables; default is to replace existing variables prefix(str) generate new variables with str prefix label(name) specify a name for the value label defined by the transformation rules copyrest copy out-of-sample values from original variables test test that rules are invoked and do not overlap -------------------------------------------------------------------------

Menu

Data > Create or change data > Other variable-transformation commands > Recode categorical variable

Description

recode changes the values of numeric variables according to the rules specified. Values that do not meet any of the conditions of the rules are left unchanged, unless an otherwise rule is specified.

A range #1/#2 refers to all (real and integer) values between #1 and #2, including the boundaries #1 and #2. This interpretation of #1/#2 differs from that in numlists.

min and max provide a convenient way to refer to the minimum and maximum for each variable in varlist and may be used in both the from-value and the to-value parts of the specification. Combined with if and in, the minimum and maximum are determined over the restricted dataset.

The keyword rules specify transformations for values not changed by the previous rules:

nonmissing all nonmissing values not changed by the rules missing all missing values (., .a, .b,..., .z) not changed by the rules else all nonmissing and missing values not changed by the rules * synonym for else

recode provides a convenient way to define value labels for the generated variables during the definition of the transformation, reducing the risk of inconsistencies between the definition and value labeling of variables. Value labels may be defined for integer values and for the extended missing values (.a, .b,..., .z), but not for noninteger values and or for sysmiss (.).

Although this is not shown in the syntax diagram, the parentheses around the rules and keyword clauses are optional if you transform only one variable and if you do not define value labels.

Options

+---------+ ----+ Options +----------------------------------------------------------

generate(newvar) specifies the names of the variables that will contain the transformed variables. into() is a synonym for generate(). Values outside the range implied by if or in are set to missing (.), unless the copyrest option is specified.

If generate() is not specified, the input variables are overwritten; values outside the if or in range are not modified. Overwriting variables is dangerous (you cannot undo changes, value labels may be wrong, etc.), so we strongly recommend specifying generate().

prefix(str) specifies that the recoded variables be returned in new variables formed by prefixing the names of the original variables with str.

label(name) specifies a name for the value label defined from the transformation rules. label() may be defined only with generate() (or its synonym, into()) and prefix(). If a variable is recoded, the label name defaults to newvar unless a label with that name already exists.

copyrest specifies that out-of-sample values be copied from the original variables. In line with other data management commands, recode defaults to setting newvar to missing (.) outside the observations selected by if exp and in range.

test specifies that Stata test whether rules are ever invoked or that rules overlap; for example, (1/5=1) (3=2).

Examples

--------------------------------------------------------------------------- Setup . webuse recxmpl

List the data . list

For x, change 1 to 2, leave all other values unchanged, and store the results in nx . recode x (1 = 2), gen(nx)

List the result . list x nx

For x1, swap 1 and 2, and store the results in nx1 . recode x1 (1 = 2) (2 = 1), gen(nx1)

List the result . list x1 nx1

For x2, collapse 1 and 2 into 1, change 3 to 2, change 4 through 7 to 3, and store the results in nx2 . recode x2 (1 2 = 1) (3 = 2) (4/7 = 3), gen(nx2)

List the result . list x2 nx2

For x1, x2, and x3, change the direction of 1, 2, ..., 8, moving 8 to 1, 7 to 2, etc., and store the transformed variables in newx1, newx2, and newx3 . recode x1-x3 (1=8) (2=7) (3=6) (4=5) (5=4) (6=3) (7=2) (8=1), pre(new) test

List the result . list x1 newx1 x2 newx2 x3 newx3

--------------------------------------------------------------------------- Setup . webuse fullauto, clear

For rep77 and rep78, collapse 1 and 2 into 1, change 3 to 2, collapse 4 and 5 into 3, store results in newrep77 and newrep78, and define a new value label newrep . recode rep77 rep78 (1 2 = 1 "Below average") (3 = 2 Average) (4 5 = 3 "Above average"), pre(new) label(newrep)

List the old and new value label . label list repair newrep

List some of the data . list *rep77 *rep78 in 1/10, nolabel ---------------------------------------------------------------------------

Tip: long recode commands may conveniently be written using the line continuation ///. For example

. recode x y (1 2 = 1 low) /// (3 = 2 medium) /// (4 5 = 3 high) /// (nonmissing = 9 "something else") /// (missing = .) /// , gen(Rx Ry) label(Cat3)

Video example

How to create a categorical variable from a continuous variable


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index