Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: Substring extraction based on punctuation


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: RE: Substring extraction based on punctuation
Date   Thu, 30 Jun 2005 01:14:56 +0100

-split- was written for precisely this purpose. 

Follow with -trim()-. 

Nick 
[email protected] 

David E Moore
 
> Have you considered -tokenize- using "," as the parse character?

Michael S. Hanson
 
> I have a (large) set of variables with labels of the (general) form:
> 
> 	Some text, some more text, still more text
> 	Also some text, lots and lots more text, text
> 	(etc.)
> 
> The commas are the separators of interest to me:  I would like to 
> extract the sub-strings before, between and after the commas 
> (excluding 
> the commas and trailing spaces themselves) into three local string 
> variables for further use.  The number of words in each part of the 
> label vary as do the total number of words;  hence the -word # of 
> `varname'- extended macro does not appear to apply here.  The 
> closest I 
> have come with extended macros is:
> 
> 	local varlbl : variable label `varname'
> 	local varlbl1 : piece 1 20 of "`varname'"
> 	local varlbl2 : piece 2 20 of "`varname'"
> 	local varlbl3 : piece 3 20 of "`varname'"
> 
> but this doesn't reliably return the desired substrings (given the 
> variation in words (and in word length) between commas) -- 20 here is 
> simply an approximate value that works for a particular subset of 
> labels.  Same with the -nobreak- option.  (This code also does not 
> strip off the commas.)
> 
> So instead of extended macros, I've tried using string functions.  I 
> suspect that if I knew and understood regular expression syntax, I 
> could make use of -regexm- and -regexs- on `varlbl' -- but I don't.  
> Instead, the following "works":
> 
> 	local varlbl : variable label `varname'
> 	local l = length("`varlbl'")
> 	local c1 = strpos("`varlbl'",",")
> 	local c2 = strpos(reverse("`varlbl'"),",")
> 	local varlbl1 = substr("`varlbl'",1,`c1'-1)
> 	local varlbl2 = substr("`varlbl'",`c1'+2,`l'-`c1'-`c2'-1)
> 	local varlbl3 = substr("`varlbl'",`l'-`c2'+3,`l')
> 
> ... but I'm really hoping to find some alternative code that is 
> "cleaner" and more transparent.  Any such suggestions are welcome.  

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index