Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: splitting strings

 From Daniel Henriksen To statalist@hsphsun2.harvard.edu Subject Re: st: splitting strings Date Mon, 14 Feb 2011 09:47:00 +0100

```Hello Nick, Scott and Eric (and everyone else)

Thank you all for your input, the suggested solutions, and for
pointing out the differences! Nicks solution is the closest to what I
need, but I think I know where I can use Scott's way (in another
context).

@ Nick: I'm glad I didn't miss something when searching the archives
for this problem :-)

Cheers
Daniel

2011/2/10 Eric Booth <ebooth@ppri.tamu.edu>:
> <>
>
> Scott's solution doesn't check V2 for the word that should parse V1 (within the same observation), but instead checks whether V1 contains some text found anywhere in V2.
> This could be a problem if V1 contains segments that can be found in several levels of V2, see the 3rd observation I added below:
>
> ***!
> clear
> input str29 v1 str10 v2
> "hello John Smith how are you?" "John Smith"
> "I’m fine Jane, how about you?" "Jane,"
> "I'm Jane, but you're John Smith, right?" ", ri"
> end
> levelsof v2
> return list
> split v1, parse(`=r(levels)')
> l
> ***!
>
> Nick's solution would split the 3rd obs on "ri", but Scott's would split it on "Jane" from the 2nd obs.
>
> - Eric
> __
> Eric A. Booth
> Public Policy Research Institute
> Texas A&M University
> ebooth@ppri.tamu.edu
> Office: +979.845.6754
>
>
>
> On Feb 10, 2011, at 7:20 AM, Scott Merryman wrote:
>
>> If the data set is small  then this should work:
>>
>> clear
>> input str29 v1 str10 v2
>> "hello John Smith how are you?" "John Smith"
>> "I’m fine Jane, how about you?" "Jane,"
>> end
>> levelsof v2
>> return list
>> split v1, parse(`=r(levels)')
>> l
>>
>> Scott
>>
>>
>> On Thu, Feb 10, 2011 at 7:08 AM, Daniel Henriksen
>> <henriksen.dp@gmail.com> wrote:
>>> I'm sorry, the example is unreadable.
>>> In observation one :
>>> V1: hello John Smith how are you?
>>> V2 John Smith
>>> V1_1 hello
>>> V1_2 how are you?
>>>
>>> Observation two:
>>> V1: I’m fine Jane, how about you?
>>> V2: Jane,
>>> V1_1 I'm fine
>>>
>>> So the splitting of V1 varies from observation to observation
>>> depending on the string text in V2
>>> Hope this makes sense
>>>
>>> /Daniel
>>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

2011/2/10 Nick Cox <n.j.cox@durham.ac.uk>:
> What a nice question. I'm credited as the original author of -split-, following earlier joint work with Michael Blasnik, and I don't think I thought of this rather natural question when writing it.
>
> I checked to see whether I had solved it by accident and I hadn't. Whatever you specify as argument to -parse()- is taken literally and not checked to see if it is a variable name.
>
> However, there is a work-around.
>
> clonevar V1_2 = V1
> replace V1_2 = subinstr(V1_2, V2, "&", .)
> split V1_2, parse(&)
>
> The essential is to use as new separator -- "&" in the example -- something that does not otherwise occur. You can test any potential separator by e.g.
>
> assert strpos(V1, "&") == 0
>
> Nick
> n.j.cox@durham.ac.uk
>
> Daniel Henriksen
>
> I have a question regarding splitting up strings.
>  Is it possible to split up a string using a string from another
> variable defined in the same observation. I'm thinking of using the
> "split" command.
> Here's an example, where V1 is the string I'd like to split and V2 is
> where I'd like to split (different from each observation). V1_1 and
> V1_2 are the results of the splitting
> V1                                              V2
> V1_1          V1_2
> hello John Smith how are you?      John Smith     hello           how are you?
> I'm fine Jane, how about you         Jane,              I'm fine
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

--
Daniel Henriksen
Ph.d. studerende, læge
Infektionsmedicinsk afd Q / Akut Modtage Afdelingen
Odense Universitetshospital
Bygning 2, 1. sal
Sdr. Boulevard 29
5000 Odense C

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```