.-
help for ^tabsplit^
.-

Tabulate and generate string variables split into parts
-------------------------------------------------------

    ^tabsplit^ strvar [^if^ exp] [^in^ range] [, ^g^enerate^(^stub^)^
    ^gmax(^#^) h^eader^(^headerstr^) p^unct^(^punctchars^) s^ort
    ^sa^ving^(^filename^) ]

Description
-----------

^tabsplit^ tabulates frequencies of occurrence of the parts of a string
variable.

By default, the parts of a string are separated by spaces. The parts of
^"1 2 3"^ are thus ^"1"^, ^"2"^ and ^"3"^.

Optionally, alternative punctuation characters may be specified. The
parts of ^"1,2,3"^ with ^p(,)^ are, again, ^"1"^, ^"2"^ and ^"3"^. The
parts of ^"1 2 3"^ with ^p(,)^ are just the single part ^"1 2 3"^.


Remarks
-------

Leading and trailing spaces are ignored by ^tabsplit^. Thus, string
values that equal one or more spaces are treated just as if they were
missing. Also with ^" 1,  2,   3"^ and ^p(,)^ the parts are ^"1"^, ^"2"^
and ^"3"^.

The idea of a part generalises Stata's concept of a word:

    ^. local words "Stata for data analysis"^
    ^. local word4 : word 4 of `words'^

puts the string ^"analysis"^ in local macro ^word4^.

To get just the first part of a string, there is another way,
exemplified here with space ^" "^ as a separator, but it works with any
other single separator:

    .^ gen str1 Make = ""^
    .^ replace Make = substr(make,1,index(make," ")-1)^
    .^ replace Make = make if Make == ""^

This way allows you to be ignorant about the length of string needed:
with ^replace^, Stata automatically changes the variable type if needed.


Options
-------

^generate(^stub^)^ generates new string variables stub^1^, stub^2^, etc.
    containing parts 1, 2, etc. of strvar. The number of new variables
    created will be (at most) the maximum number of parts present in a
    string value. With ^g(mystr)^ and strings

    ^"Stata for data analysis"^
    ^"soap for washing"^

    4 variables will be created: ^mystr1^ to ^mystr4^. ^mystr1[1]^ will
    be ^"Stata"^ and ^mystr2[4]^ will be empty ^""^.

^gmax(^#^)^ specifies that only stub^1^ to stub# should be generated. A
    number greater than the maximum number of parts will have no effect.

^header(^headerstr^)^ specifies a heading for the table. Default ^Parts^.

^punct(^punctchars^)^ specifies alternative punctuation characters
    deemed to separate parts. One or more characters may be specified.
    If ^punct( )^ is not specified, spaces are used. (Note that
    attempting to specify just a space will result in a syntax error.)

    If you wish to use the space ^" "^ as well as other non-space
    characters, specify it between two such characters: e.g. ^p(, ,)^.
    (Note that a space before or after will get ignored: ^p( ,)^ is
    treated as if it were ^p(,)^.) ^tabsplit^ does not mind any
    repetition of characters.

    As a special case, ^punct(no)^ indicates no punctuation characters.
    Strings will be split into separate characters other than spaces.

^sort^ indicates that ^tabsort^ rather than ^tabulate^ will be used to
    produce a table with frequencies sorted from highest to lowest. This
    option may only be used if ^tabsort^ has been installed.

^saving(^filename^)^ saves an expanded data set to filename. This will
    contain a new variable ^_part^ containing each part of a string and
    a new variable ^_orig^ which is 1 for each original observation and
    0 otherwise. Thus after any ^use^ filename the original data set may
    be restored by

    . ^drop _part^
    . ^keep if _orig^


Examples
--------

    . ^tabsplit reasons, p(,) h(Reasons)^

    .^ qui tabsplit make, gen(Make) gmax(1)^
    .^ tab Make1^


Author
------

         Nicholas J. Cox, University of Durham, U.K.
         n.j.cox@@durham.ac.uk


Also see
--------

On-line: help for @tabsort@ (if installed)