Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: separating string variable into components


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: separating string variable into components
Date   Mon, 14 Aug 2006 11:57:16 +0100

As the original author of -split-, I can see a fairly painless
way to use that command. The trick is to see that you definitely
do _not_ want to parse on spaces, as not only is there a variable
number of spaces between substrings, but also spaces separate
elements within substrings. But each substring ends with 
a right parenthesis. Now 

split mystring, p(")") 

parses on right parentheses, but will delete them, but that
is trivial. You may want to put them back, for which 

foreach v in `r(varlist)' { 
	replace `v' = `v' + ")" 
} 

-- or you prefer to take out the left parentheses, for which 

foreach v in `r(varlist)' { 
	replace `v' = subinstr(`v', "(","",.) 
}

Note here that r(varlist) is left behind by -split- as a list
of the names of the variables it creates, but will be zapped by 
the next r-class command. You can do it directly by naming those
variables if you prefer. 

Nick 
n.j.cox@durham.ac.uk 

P.S. I follow the terminology that () are parentheses, []  
brackets and {} are braces. Using brackets in the wide
sense either creates ambiguity or commits you to needing
to say round, square and curly to disambiguate. 

Radu Ban
 
> I have a string variable that looks like this:
> 
> mystring
>    (1 2 3) (1 2 2)  (7 8 9)    (1 3 4)
>  (2 3 4)    (1 2 3) (10 11 12)
> 
> etc. The numbers inside the brackets are made up. The problem is that
> the number of spaces between brackets is not constant. Also the number
> of brackets is not constant across observations. I want to split this
> variable so that each bracket is contained in its own variable, i.e.
> 
> split1   split2    split3         split4
> (1 2 3)  (1 2 2)  (7 8 9)        (1 3 4)
> (2 3 4)  (1 2 3)  (10 11 12)   <blank>
> 
> I've tried the -split- command, with various numbers of spaces as the
> parse character, but that doesn't work, i.e. it doesn't split if i
> specify too many blanks, or it creates blank observations if i specify
> too few blanks.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index