Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How do I split a string variable without spaces by capital letters?


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: How do I split a string variable without spaces by capital letters?
Date   Mon, 19 Aug 2013 17:19:33 +0100

I like -moss- too for a variety of reasons.

It's important to understand, however, why the approach with -split-
does not work. -split- expects that strings can be parsed into
substrings using separators. Part of the definition of separators is
that they can thrown away, but that's not true here.

Independently of that, the reason that -split- did nothing to the
original variables is that it was looking for literal strings such as

"upper(a-z)"

as separators, and did not find any examples. The syntax is not
illegal, but it's a long way from doing what you wanted, as -split-
does not understand regular expressions and won't apply functions to
them either.

split v1, p(`c(ALPHA)')

would have chopped whenever it saw any of A ... Z but those upper-case
letters would have been thrown out too.

Nick
[email protected]


On 19 August 2013 16:31, Robert Picard <[email protected]> wrote:
> You can use -moss- (available from SSC) to handle this problem. The
> following works with your example:
>
> moss v1, match("([A-Z][^A-Z]*)") regex
>
> The pattern indicates that you are looking for substrings that start
> with a capital letter (i.e [A-Z]) followed by zero or more non-capital
> letters (i.e. [^A-Z]*).
>
> On Mon, Aug 19, 2013 at 10:06 AM, Andrew Dickens <[email protected]> wrote:
>> Hi all,
>>
>> I'm currently running Stata 10, and I'm having a problem splitting a string
>> variable by capital letters. Elena Vidal posted something under a similar
>> title, http://www.stata.com/statalist/archive/2011-11/msg01195.html, but the
>> her problem is somewhat different than mine and I was unable to
>> troubleshoot.
>>
>> An example of my data is as follows:
>>
>> clear all
>> inp str13(v1)
>> "TestOne"
>> "ThisistestTwo"
>> "AndThree"
>> end
>>
>> The problem is the capital letter I wish to split each cell by is not
>> consistently placed.
>>
>> I tried splitting using this code:
>>
>> split v1, p(upper(a-z))
>> or
>> split v1, p(upper(.))
>>
>> but this just generates an identical variable to v1.
>>
>> What I would like to do is create two new variables, so the first
>> observation of my example would have "Test" in the first new variable and
>> "One" in the second new variable. Suggestions would be greatly appreciated.
>>
>> Thank you for your consideration.
>>
>> Andrew
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index