Stata 15 help for _mkcross


[D] _mkcross -- Cross variables with automatic short value labels


_mkcross varlist [if] [in], options

options Description ------------------------------------------------------------------------- generate(newvar) required; name of the value-labeled "crossed" variable identifying combinations of varlist labelname(name) name of value label for newvar; default is newvar missing treat missing values in crossing variables as ordinary values keyword code missing values with keywords strok string variables are allowed coding(matname) returns a coding matrix length(#) truncate codes of crossing variables at length # length(minimal) generate minimal length unique codes for crossing variables sep(str) separator between codes of crossing variables; default sep("_") maxlength(#) maximum crossed code length; default is maxlength(12) start(#) starting index for group values

edit(space) drop spaces from coding strings edit(first) derive codes from first word in coding string edit(vowel) drop vowels and spaces from coding string

case(lower) convert coding strings to lower case case(upper) convert coding strings to upper case case(first) capitalize each word in coding strings ------------------------------------------------------------------------- report(variables) display the coding for the nonnumeric crossing variables report(crossed) display the codes (value labels) for the crossed variable report(all) display the coding of the crossing and crossed variables

truncate(#) maximum length for descriptions in crossed variable report table -------------------------------------------------------------------------


_mkcross creates one "crossed" variable taking on values 1, 2, ... for the groups formed by a varlist of up to six "crossing" variables. The order of the groups is that of the sort order of varlist, and is identical to that produced by the group() function of egen.

By default, the crossing variables are coded at equal length; the number of characters for coding a variable depends on the number of crossing variables, on maxlength(), and on the length of the separator string. For instance, with two variables, the default maxlength(12), and the default separator ("_"), each of the variables is coded at length 5, and the value labels of the crossed variable are of length (at most) 11. With three variables, each is coded at length 3 etc. A warning is displayed if the coding strings are not unique. For instance, length 4 codes for "Australia" and "Austria" are not unique.

_mkcross() allows extensive control of how value labels of the crossed variable are defined from the codes (string values, value labels, numeric values) of the crossing variables.


generate(newvar) is not optional and specifies the name of the value-labeled "crossed" variable identifying combinations of the varlist.

labelname(lname) specifies the name to be given to the value label created to hold the labels for newvar. The default value label name is lname. lname should not exist as a value label.

missing indicates that missing values in varlist (either ., .a, etc., for numeric variables or the empty string "" for string variables) are to be treated like any other value when assigning groups, instead of as missing values being assigned to the group missing.

keyword codes missing values by keywords: . by dot, .a by dota, .b by dotb, etc.

strok specifies that crossing variables may be string variables.

coding(matname) specifies that a ncat x nvar coding matrix is returned in matname. Here ncat is the number of distinct values in the crossed variable newvar, and nvar is the number of crossing variables. The rownames of matname are the coding values 1, 2, ... unless the start() option is specified. The coding() option is not allowed with string variables.

length(#) truncates the codes of crossing variables at length #. Numeric non-value-labeled variables are encoded at equal length, padded with zeros.

length(minimal) produces unique codes of minimal length.

For value-labeled numeric and string variables, the coding uses the left-most characters. Examples:

"male", "female" --> "m", "f" "Netherlands", "Nigeria", "Norway" --> "Ne", "Ni", "No"

Minimal unique codes for numeric variables are determined right to left. Examples:

2000, 2001, 2002, 2004 --> 0, 1, 2, 3 1999, 2000, 2001, 2002 --> 9, 0, 1, 2

sep(str) specifies a separator string str between the codes for crossing variables. The default is sep("_").

maxlength(#) specifies the maximal length for the value labels (codes) in the crossed variable newname. The default is maxlength(12).

start(#) specifies the starting index for the group values. The default, start(1), creates group values 1, 2, .... start(0) creates values 0, 1, 2, ....

edit(opt) performs various code manipulations. Editing occurs before extracting subcodes or determining minimal unique subcodes (see option length(minimal)).

edit(space) drops all spaces from codes.

edit(first} selects the first word of codes.

edit(vowel) drops vowels and spaces from codes.

case(opt) modifies the case of the codes of the crossing variables. Case modification occurs before extracting subcodes or determining minimal unique subcodes (see option length(minimal)).

case(lower) converts codes to lowercase.

case(upper) converts codes to uppercase.

case(first) converts codes to lowercase except for the first character of each word which is converted to uppercase.

report(opt) displays a report of the construction of the coding.

report(variables) displays a coding table for the crossing variables in varlist.

report(crossed) displays a coding table (value labels) for the crossed variable newvar.

report(all) displays the coding tables of the crossing and crossed variables.

truncate(#) truncates full descriptions in the report table of crossed variables to # characters. The default is truncate(24). The maximum allowable is truncate(32). truncate() does not affect the value labels that are actually formed, only how the codes are reported.


You have two value-labeled variables

relig 1 none party 1 democratic party 2 protestant 2 republican party 3 catholic 3 independent 4 islam 5 other

To form the crossed variable relpa with all combinations of the two variables party and relig,

. _mkcross relig party, gen(relpa)

relpa has 15 values (unless some combinations do not occur in the data) with value labels of length up to 11 characters; for instance, (relig=2, party=1) has group value 2, and value label "prote_democ".

. _mkcross relig party, gen(relpa) length(3)

produces the same grouping variable relpa, but shorter value labels. Now (relig=2, party=1) has value label "pro_dem". Minimal coding is, here one character for both relig and party, and

. _mkcross relig party, gen(relpa) length(min)

generates a value label for (relig=2, party=1) that is just "p_d". You may have to get used to these value labels, but they are quite useful, especially in plots with many value-labeled plotpoints.

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index