Stata 15 help for gettoken

[P] gettoken -- Low-level parsing


gettoken emname1 [emname2] : emname3 [, parse("pchars") quotes qed(lmacname) match(lmacname) bind]

where pchars are the parsing characters, lmacname is a local macro name, and emname is described in the following table:

emname is... Refers to a ... ----------------------------------- macroname local macro (local) macroname local macro (global) macroname global macro


gettoken is a low-level parsing command designed for programmers who wish to parse input for themselves. The syntax command is an easier-to-use, high-level parsing command.

gettoken obtains the next token from the macro emname3 and stores it in the macro emname1. If macro emname2 is specified, the rest of the string from emname3 is stored in emname2 macro. emname1 and emname3, or emname2 and emname3, may be the same name. The first token is determined based on parsing characters pchars, which default to a space if not specified.


parse("pchars") specifies the parsing characters. If parse() is not specified, parse(" ") is assumed, meaning tokens are identified by blanks.

quotes specifies that the outside quotes are not to be stripped in what is stored in emname1. This option has no effect on what is stored in emname2 because it always retains outside quotes. quotes is a rarely specified option; usually you want the quotes stripped. You would not want the quotes stripped if you wanted to make a perfect copy of the contents of the original macro for parsing at a later time.

qed(lmacname) specifies a local macroname that is to be filled in with 1 or 0 according to whether the returned token was enclosed in quotes in the original string. qed() does not change how parsing is done; it merely returns more information.

match(lmacname) specifies that parentheses be matched in determining the token. The outer level of parentheses, if any, are removed before the token is stored in emname1. The local macro lmacname is set to "(" if parentheses were found; otherwise, it is set to an empty string.

bind specifies that expressions within parentheses and those within brackets are to be bound together, even when not parsing on () and [].


Often we apply gettoken to the macro `0', as in

gettoken first : 0

which obtains the first token (with spaces as token delimiters) from `0' and leaves `0' unchanged. Or, alternatively,

gettoken first 0 : 0

which obtains the first token from `0' and saves the rest back in `0'.


Assume that `0' contains `"by x: cmd if sex=="male""'

. gettoken left 0: 0, parse(": ")

results in `left' containing `"by"' `0' containing `" x: cmd if sex=="male""'


. gettoken next 0: 0, parse(": ")

results in `next' containing `"x"' `0' containing `": cmd if sex=="male""'


. gettoken next 0: 0, parse(": ")

results in `next' containing `":"' `0' containing `" cmd if sex=="male""'

You wish to create a two-word command. For example, mycmd list does one thing and mycmd generate does another.

program mycmd version 15.1 gettoken subcmd 0: 0 if "`subcmd'"=="list" { mycmd_l `0' } else if "`subcmd'"=="generate" { mycmd_g `0' } else error 199 end

program mycmd_l ... end

program mycmd_g ... end

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index