Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Daniel Henriksen <henriksen.dp@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: splitting strings |
Date | Mon, 14 Feb 2011 09:47:00 +0100 |
Hello Nick, Scott and Eric (and everyone else) Thank you all for your input, the suggested solutions, and for pointing out the differences! Nicks solution is the closest to what I need, but I think I know where I can use Scott's way (in another context). @ Nick: I'm glad I didn't miss something when searching the archives for this problem :-) Cheers Daniel 2011/2/10 Eric Booth <ebooth@ppri.tamu.edu>: > <> > > Scott's solution doesn't check V2 for the word that should parse V1 (within the same observation), but instead checks whether V1 contains some text found anywhere in V2. > This could be a problem if V1 contains segments that can be found in several levels of V2, see the 3rd observation I added below: > > ***! > clear > input str29 v1 str10 v2 > "hello John Smith how are you?" "John Smith" > "I’m fine Jane, how about you?" "Jane," > "I'm Jane, but you're John Smith, right?" ", ri" > end > levelsof v2 > return list > split v1, parse(`=r(levels)') > l > ***! > > Nick's solution would split the 3rd obs on "ri", but Scott's would split it on "Jane" from the 2nd obs. > > - Eric > __ > Eric A. Booth > Public Policy Research Institute > Texas A&M University > ebooth@ppri.tamu.edu > Office: +979.845.6754 > > > > On Feb 10, 2011, at 7:20 AM, Scott Merryman wrote: > >> If the data set is small then this should work: >> >> clear >> input str29 v1 str10 v2 >> "hello John Smith how are you?" "John Smith" >> "I’m fine Jane, how about you?" "Jane," >> end >> levelsof v2 >> return list >> split v1, parse(`=r(levels)') >> l >> >> Scott >> >> >> On Thu, Feb 10, 2011 at 7:08 AM, Daniel Henriksen >> <henriksen.dp@gmail.com> wrote: >>> I'm sorry, the example is unreadable. >>> In observation one : >>> V1: hello John Smith how are you? >>> V2 John Smith >>> V1_1 hello >>> V1_2 how are you? >>> >>> Observation two: >>> V1: I’m fine Jane, how about you? >>> V2: Jane, >>> V1_1 I'm fine >>> V1_2 how about you? >>> >>> So the splitting of V1 varies from observation to observation >>> depending on the string text in V2 >>> Hope this makes sense >>> >>> /Daniel >>> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > 2011/2/10 Nick Cox <n.j.cox@durham.ac.uk>: > What a nice question. I'm credited as the original author of -split-, following earlier joint work with Michael Blasnik, and I don't think I thought of this rather natural question when writing it. > > I checked to see whether I had solved it by accident and I hadn't. Whatever you specify as argument to -parse()- is taken literally and not checked to see if it is a variable name. > > However, there is a work-around. > > clonevar V1_2 = V1 > replace V1_2 = subinstr(V1_2, V2, "&", .) > split V1_2, parse(&) > > The essential is to use as new separator -- "&" in the example -- something that does not otherwise occur. You can test any potential separator by e.g. > > assert strpos(V1, "&") == 0 > > Nick > n.j.cox@durham.ac.uk > > Daniel Henriksen > > I have a question regarding splitting up strings. > Is it possible to split up a string using a string from another > variable defined in the same observation. I'm thinking of using the > "split" command. > Here's an example, where V1 is the string I'd like to split and V2 is > where I'd like to split (different from each observation). V1_1 and > V1_2 are the results of the splitting > V1 V2 > V1_1 V1_2 > hello John Smith how are you? John Smith hello how are you? > I'm fine Jane, how about you Jane, I'm fine > how about you? > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Daniel Henriksen Ph.d. studerende, læge Infektionsmedicinsk afd Q / Akut Modtage Afdelingen Odense Universitetshospital Bygning 2, 1. sal Sdr. Boulevard 29 5000 Odense C * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/