Stata 11 help for split

help split dialog: split -------------------------------------------------------------------------------

Title

[D] split -- Split string variables into parts

Syntax

split strvar [if] [in] [, options]

options description ------------------------------------------------------------------------- Main generate(stub) begin new variable names with stub; default is strvar parse(parse_strings) parse on specified strings; default is to parse on spaces limit(#) create a maximum of # new variables notrim do not trim leading or trailing spaces of original variable

Destring destring apply destring to new string variables, replacing initial string variables with numeric variables where possible ignore("chars") remove specified nonnumeric characters force convert nonnumeric strings to missing values float generate numeric variables as type float percent convert percent variables to fractional form -------------------------------------------------------------------------

Menu

Data > Create or change data > Other variable-transformation commands > Split string variables into parts

Description

split splits the contents of a string variable, strvar, into one or more parts, using one or more parse_strings (by default, blank spaces), so that new string variables are generated. Thus split is useful for separating "words" or other parts of a string variable. strvar itself is not modified.

Options

+------+ ----+ Main +-------------------------------------------------------------

generate(stub) specifies the beginning characters of the new variable names so that new variables stub1, stub2, etc., are produced. stub defaults to strvar.

parse(parse_strings) specifies that, instead of using spaces, parsing use one or more parse_strings. Most commonly, one string that is one punctuation character will be specified. For example, if parse(,) is specified, then "1,2,3" is split into "1", "2", and "3".

You can also specify 1) two or more strings that are alternative separators of "words" and 2) strings that consist of two or more characters. Alternative strings should be separated by spaces. Strings that include spaces should be bound by " ". Thus if parse(, " ") is specified, "1,2 3" is also split into "1", "2", and "3". Note particularly the difference between, say, parse(a b) and parse(ab): with the first, a and b are both acceptable as separators, whereas with the second, only the string ab is acceptable.

limit(#) specifies an upper limit to the number of new variables to be created. Thus limit(2) specifies that, at most, two new variables be created.

notrim specifies that the original string variable not be trimmed of leading and trailing spaces before being parsed. notrim is not compatible with parsing on spaces, because the latter implies that spaces in a string are to be discarded. You can either specify a parsing character, or, by default, allow a trim.

+----------+ ----+ Destring +---------------------------------------------------------

destring applies destring to the new string variables, replacing the variables initially created as strings by numeric variables where possible. See [D] destring.

ignore(), force, float, percent; see [D] destring.

Examples

1. Suppose that input is somehow misread as one string variable, say, when you copy and paste into the Data Editor, but data are space-separated:

. split var1, destring

2. Email addresses split at "@":

. split address, p(@)

3. Suppose that a string variable holds names of legal cases that should be split into variables for plaintiff and defendant. The separators could be " V ", " V. ", " VS ", and " VS. ". Note particularly the leading and trailing spaces in our detailing of separators: the first separator is " V ", for example, not "V", which would incorrectly split "GOLIATH V DAVID" into "GOLIATH ", " DA", and "ID". The alternative separators are given as the argument to parse():

. split case, p(" V " " V. " " VS " " VS. ")

Signs of problems would be the creation of more than two variables and any variable having blank values, so check:

. list case if case2 == ""

4. Suppose that a string variable contains fields separated by tabs. For example, insheet leaves tabs unchanged. Knowing that a tab is char(9), we can type

. split data, p(`=char(9)') destring

p(char(9)) would not work. The argument to parse() is taken literally, but evaluation of functions on the fly can be forced as part of macro substitution.

5. Suppose that a string variable contains substrings bound in parentheses, such as (1 2 3) (4 5 6). Here we can split on the right parentheses and, if desired, replace those afterward. For example,

. split data, p(")") . foreach v in `r(varlist)' { replace `v' = `v' + ")" . }

--------------------------------------------------------------------------- Setup . webuse splitxmpl

List the data . list

Split var1 into two string variables based on " " (space) as the parsing character . split var1

List the result . list

Drop newly created variables var11 and var12 . drop var11 var12

Split var1 into two variables based on " " as the parsing character and name the variables geog1 and geog2 . split var1, gen(geog)

List the result . list var1 geog*

--------------------------------------------------------------------------- Setup . webuse splitxmpl2, clear

List the data . list

Split var1 into two variables using comma as the parsing character and name the variables geog1 and geog2 . split var1, parse(,) gen(geog)

List the result . list var1 geog*

--------------------------------------------------------------------------- Setup . webuse splitxmpl3, clear

List the data . list

Split date into variables using comma-followed-by-space and space as the parsing characters and use ndate as the prefix for the new variable names . split date, parse(", "" ") gen(ndate)

List the data . list

--------------------------------------------------------------------------- Setup . webuse splitxmpl4, clear

List the data . list

Split x into variables using comma as the parsing character, and try to replace new string variables with numeric variables . split x, parse(,) destring

List the data . list

Describe the data . describe ---------------------------------------------------------------------------

Saved results

split saves the following in r():

Scalars r(nvars) number of new variables created r(varlist) names of newly created variables

Also see

Manual: [D] split

Help: [D] destring, [D] egen, [D] functions, [D] rename, [D] separate


© Copyright 1996–2009 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index