Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Substring extraction based on punctuation

From   "Michael S. Hanson" <[email protected]>
To   [email protected]
Subject   st: Substring extraction based on punctuation
Date   Wed, 29 Jun 2005 19:30:13 -0400

I have a (large) set of variables with labels of the (general) form:

Some text, some more text, still more text
Also some text, lots and lots more text, text

The commas are the separators of interest to me: I would like to extract the sub-strings before, between and after the commas (excluding the commas and trailing spaces themselves) into three local string variables for further use. The number of words in each part of the label vary as do the total number of words; hence the -word # of `varname'- extended macro does not appear to apply here. The closest I have come with extended macros is:

local varlbl : variable label `varname'
local varlbl1 : piece 1 20 of "`varname'"
local varlbl2 : piece 2 20 of "`varname'"
local varlbl3 : piece 3 20 of "`varname'"

but this doesn't reliably return the desired substrings (given the variation in words (and in word length) between commas) -- 20 here is simply an approximate value that works for a particular subset of labels. Same with the -nobreak- option. (This code also does not strip off the commas.)

So instead of extended macros, I've tried using string functions. I suspect that if I knew and understood regular expression syntax, I could make use of -regexm- and -regexs- on `varlbl' -- but I don't. Instead, the following "works":

local varlbl : variable label `varname'
local l = length("`varlbl'")
local c1 = strpos("`varlbl'",",")
local c2 = strpos(reverse("`varlbl'"),",")
local varlbl1 = substr("`varlbl'",1,`c1'-1)
local varlbl2 = substr("`varlbl'",`c1'+2,`l'-`c1'-`c2'-1)
local varlbl3 = substr("`varlbl'",`l'-`c2'+3,`l')

... but I'm really hoping to find some alternative code that is "cleaner" and more transparent. Any such suggestions are welcome. Thanks in advance.

-- Mike

* For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index