help split dialog: split
-------------------------------------------------------------------------------
Title
[D] split -- Split string variables into parts
Syntax
split strvar [if] [in] [, options]
options description
-------------------------------------------------------------------------
Main
generate(stub) begin new variable names with stub; default is
strvar
parse(parse_strings) parse on specified strings; default is to parse
on spaces
limit(#) create a maximum of # new variables
notrim do not trim leading or trailing spaces of
original variable
Destring
destring apply destring to new string variables,
replacing initial string variables with
numeric variables where possible
ignore("chars") remove specified nonnumeric characters
force convert nonnumeric strings to missing values
float generate numeric variables as type float
percent convert percent variables to fractional form
-------------------------------------------------------------------------
Menu
Data > Create or change data > Other variable-transformation commands >
Split string variables into parts
Description
split splits the contents of a string variable, strvar, into one or more
parts, using one or more parse_strings (by default, blank spaces), so
that new string variables are generated. Thus split is useful for
separating "words" or other parts of a string variable. strvar itself is
not modified.
Options
+------+
----+ Main +-------------------------------------------------------------
generate(stub) specifies the beginning characters of the new variable
names so that new variables stub1, stub2, etc., are produced. stub
defaults to strvar.
parse(parse_strings) specifies that, instead of using spaces, parsing use
one or more parse_strings. Most commonly, one string that is one
punctuation character will be specified. For example, if parse(,) is
specified, then "1,2,3" is split into "1", "2", and "3".
You can also specify 1) two or more strings that are alternative
separators of "words" and 2) strings that consist of two or more
characters. Alternative strings should be separated by spaces.
Strings that include spaces should be bound by " ". Thus if
parse(, " ") is specified, "1,2 3" is also split into "1", "2", and
"3". Note particularly the difference between, say, parse(a b) and
parse(ab): with the first, a and b are both acceptable as separators,
whereas with the second, only the string ab is acceptable.
limit(#) specifies an upper limit to the number of new variables to be
created. Thus limit(2) specifies that, at most, two new variables be
created.
notrim specifies that the original string variable not be trimmed of
leading and trailing spaces before being parsed. notrim is not
compatible with parsing on spaces, because the latter implies that
spaces in a string are to be discarded. You can either specify a
parsing character, or, by default, allow a trim.
+----------+
----+ Destring +---------------------------------------------------------
destring applies destring to the new string variables, replacing the
variables initially created as strings by numeric variables where
possible. See [D] destring.
ignore(), force, float, percent; see [D] destring.
Examples
1. Suppose that input is somehow misread as one string variable, say,
when you copy and paste into the Data Editor, but data are
space-separated:
. split var1, destring
2. Email addresses split at "@":
. split address, p(@)
3. Suppose that a string variable holds names of legal cases that should
be split into variables for plaintiff and defendant. The separators
could be " V ", " V. ", " VS ", and " VS. ". Note particularly the
leading and trailing spaces in our detailing of separators: the
first separator is " V ", for example, not "V", which would
incorrectly split "GOLIATH V DAVID" into "GOLIATH ", " DA", and "ID".
The alternative separators are given as the argument to parse():
. split case, p(" V " " V. " " VS " " VS. ")
Signs of problems would be the creation of more than two variables
and any variable having blank values, so check:
. list case if case2 == ""
4. Suppose that a string variable contains fields separated by tabs. For
example, insheet leaves tabs unchanged. Knowing that a tab is
char(9), we can type
. split data, p(`=char(9)') destring
p(char(9)) would not work. The argument to parse() is taken
literally, but evaluation of functions on the fly can be forced as
part of macro substitution.
5. Suppose that a string variable contains substrings bound in
parentheses, such as (1 2 3) (4 5 6). Here we can split on the right
parentheses and, if desired, replace those afterward. For example,
. split data, p(")")
. foreach v in `r(varlist)' {
replace `v' = `v' + ")"
. }
---------------------------------------------------------------------------
Setup
. webuse splitxmpl
List the data
. list
Split var1 into two string variables based on " " (space) as the parsing
character
. split var1
List the result
. list
Drop newly created variables var11 and var12
. drop var11 var12
Split var1 into two variables based on " " as the parsing character and
name the variables geog1 and geog2
. split var1, gen(geog)
List the result
. list var1 geog*
---------------------------------------------------------------------------
Setup
. webuse splitxmpl2, clear
List the data
. list
Split var1 into two variables using comma as the parsing character and
name the variables geog1 and geog2
. split var1, parse(,) gen(geog)
List the result
. list var1 geog*
---------------------------------------------------------------------------
Setup
. webuse splitxmpl3, clear
List the data
. list
Split date into variables using comma-followed-by-space and space as the
parsing characters and use ndate as the prefix for the new variable names
. split date, parse(", "" ") gen(ndate)
List the data
. list
---------------------------------------------------------------------------
Setup
. webuse splitxmpl4, clear
List the data
. list
Split x into variables using comma as the parsing character, and try to
replace new string variables with numeric variables
. split x, parse(,) destring
List the data
. list
Describe the data
. describe
---------------------------------------------------------------------------
Saved results
split saves the following in r():
Scalars
r(nvars) number of new variables created
r(varlist) names of newly created variables
Also see
Manual: [D] split
Help: [D] destring, [D] egen, [D] functions, [D] rename, [D] separate