{smcl} {hline} help for {hi:vtokenize}{right: Bill Rising} {hline} {hi:Split All Observations of a Variable into Tokens} {* put the syntax in what follows. Don't forget to use [ ] around optional items} {p 8 14} {cmd:vtokenize} {it:varname} [{cmd:if} {it:exp}] [{cmd:in} {it:range}] [{cmd:,} {cmdab:stub} {cmdab:p:arse(delimiters)} {cmd:nospace} {cmdab:nodelim:eters} {cmdab:nice:names} {p_end} {title:Description} {p} Splits the {it:varname} into its component tokens, generating as many new variables as needed. (The original variable is left untouched.) Used for working with truly nasty text files. {p_end} {title:Options} {p 0 4}{cmd:stub} specifies the start of the names of the variables which will be generated. If omitted, the stub will simply be the name of the variable being split. {p_end} {p 0 4}{cmd:parse(delimiters)} gives the list of delimiters which are used to separate tokens. If omitted, the only delimiter is whitespace (one or more spaces). There is no need to specify space a delimiter, though explicitly specifying it will not cause problems. {p_end} {p 0 4}{cmd:nospace} is used to {bf:prevent} spaces from being used as delimiters. {p_end} {p 0 4}{cmd:nodelimiters} is used to {bf:prevent} delimiters from being stored as tokens. Note that just as with {help gettoken}, spaces are never kept as tokens. {p_end} {p 0 4}{cmd:nicenames} is used to make names which sort alphabetically properly, so the suffixes _1, _2, ... or _01, _02, ... or _001, _002, ... depending on the number of variables generated. If unspecified, the suffixes are _1, _2, ..., _10, _11. The former sort alphabetically, the latter are easier to postprocess {p_end} {title:Example(s)} {p 8 12}{inp:. vtokenize foo}{break} Splits {it:foo} into words by breaking on space(s), storing the first word for each observation in {it:foo_1}, the second word in {it:foo_2}, etc. {it:foo} itself is not altered. {p_end} {p 8 12}{inp:. vtokenize foo, stub(bar)}{break} Splits {it:foo} into words by breaking on space(s), storing the first word for each observation in {it:bar_1}, the second word in {it:bar_2}, etc. {p_end} {p 8 12}{inp:. vtokenize foo, stub(bar) parse(":") nospace}{break} Splits {it:foo} into words by breaking on colons (:), storing the first token for each observation in {it:bar_1}, the second token in {it:bar_2}, etc. The colons themselves {bf:are} treated as tokens. {p_end} {p 8 12}{inp:. vtokenize foo, stub(bar) parse(":") nospace nodelimiters}{break} Splits {it:foo} into words by breaking on colons (:), storing the first token for each observation in {it:bar_1}, the second token in {it:bar_2}, etc. The colons themselves are {bf:not} treated as tokens. {p_end} {p 8 12}{inp:. vtokenize foo, stub(bar) parse(":")}{break} Splits {it:foo} into words by breaking on colons (:) and spaces, storing the first token for each observation in {it:bar_1}, the second token in {it:bar_2}, etc. The colons themselves are treated as tokens, but the spaces are not. {p_end} {title:Notes} {p} {cmd:vtokenize} checks only to see if the variable {it:stub_*1} exists when doing error checking. Thus, it will die ungracefully if, say {it:stub_3} exists, but it needs to generate its own {it:stub_3} {p_end} {title:Also see} {p} {help tokenize}, {help gettoken}, {help vgettoken} {p_end} {title:Author} Bill Rising email: {browse "mailto:brising@louisville.edu":brising@louisville.edu} web: {browse "http://www.louisville.edu/~wrrisi01":http://www.louisville.edu/~wrrisi01} snailmail: Department of Bioinformatics and Biostatistics University of Louisville Louisville, KY 40292 {title:Last Updated}: January 9, 2004 @ 14:00:23