Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How do I split a string variable without spaces by capital letters?


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: How do I split a string variable without spaces by capital letters?
Date   Tue, 20 Aug 2013 00:50:31 +0100

My line was

 replace v2 = subinstr(v2, "`L'", " `L'", .)

but you left out the first comma. You had

replace v1= subinstr(v1 "`L'", " `L'", .)

and Stata complained that it can make no sense of

v1 "A"

Not using

clonevar v2 = v1

was not an error here. But why I did clone -v1- as -v2-? The
suggestion is always to leave the original of such a string variable
in your dataset until you are sure that you have no further need for.
Suppose you work on -v1- and then mess it up. That's no good, as you
have read it in again.

Nick
[email protected]


On 20 August 2013 00:37, Haluk Vahaboglu <[email protected]> wrote:
> Nick may I ask a simple question (surely not simple to me),
> I am trying to learn the secrets of Stata. For this purpose, I test on my Stata 12.1 Ubuntu-64 bit system codes posted to this list for that those might be useful in my future studies.
> In this context, I run your loop with a small modification as shown below:
>
> clear all
> inp str13(v1)
> "TestOne"
> "ThisistestTwo"
> "AndThree"
> end
> foreach L in `c(ALPHA)' {
>         replace v1= subinstr(v1 "`L'", " `L'", .)
> }
>
> It is really a surprise to me but this did not work. Returned error:
> v1"A" invalid name
> r(198);
>
> It is working in the format you posted:
> clonevar v2 = v1
> qui foreach L in `c(ALPHA)' {
>         replace v2 = subinstr(v2, "`L'", " `L'", .)
> }
>
>  I wonder why this loop fails without "clonevar v2=v1"? I guess there is a very easy answer to this which I can not see.

>> Date: Mon, 19 Aug 2013 17:33:23 +0100
>> Subject: Re: st: How do I split a string variable without spaces by capital letters?
>> From: [email protected]
>> To: [email protected]
>>
>> Along these lines you could prefix every upper-case letter with a space.
>>
>> clonevar v2 = v1
>>
>> qui foreach L in `c(ALPHA)' {
>>         replace v2 = subinstr(v2, "`L'", " `L'", .)
>> }
>>
>> split v2
>>
>> For c(ALPHA) see results of -creturn list-.
>>
>> That doesn't presuppose just two substrings.
>>
>> Nick
>> [email protected]
>>
>>
>> On 19 August 2013 16:36, Eric A. Booth <[email protected]> wrote:
>>> <>
>>> Agreed, -moss- is great for this, but also you can do this using
>>> built-in string functions if you are interested, example:
>>>
>>> *****************!
>>> clear all
>>> inp str13(v1)
>>> "TestOne"
>>> "ThisistestTwo"
>>> "AndThree"
>>> end
>>>
>>> g v2 = reverse(v1)
>>> g pos = .
>>> g l = length(v1)
>>> foreach x in `c(ALPHA)' {
>>>    replace pos = strpos(v2, "`x'") if inlist(pos, ., 0, l)
>>>   }
>>> drop v2
>>> g first = substr(v1, 1, l-pos)
>>> g second = substr(v1, l-pos+1, l)
>>> list
>>> *****************!
>>> EAB
>>>
>>>
>>>
>>> On Mon, Aug 19, 2013 at 10:31 AM, Robert Picard <[email protected]> wrote:
>>>> You can use -moss- (available from SSC) to handle this problem. The
>>>> following works with your example:
>>>>
>>>> moss v1, match("([A-Z][^A-Z]*)") regex
>>>>
>>>> The pattern indicates that you are looking for substrings that start
>>>> with a capital letter (i.e [A-Z]) followed by zero or more non-capital
>>>> letters (i.e. [^A-Z]*).
>>>>
>>>> On Mon, Aug 19, 2013 at 10:06 AM, Andrew Dickens <[email protected]> wrote:
>>>>> Hi all,
>>>>>
>>>>> I'm currently running Stata 10, and I'm having a problem splitting a string
>>>>> variable by capital letters. Elena Vidal posted something under a similar
>>>>> title, http://www.stata.com/statalist/archive/2011-11/msg01195.html, but the
>>>>> her problem is somewhat different than mine and I was unable to
>>>>> troubleshoot.
>>>>>
>>>>> An example of my data is as follows:
>>>>>
>>>>> clear all
>>>>> inp str13(v1)
>>>>> "TestOne"
>>>>> "ThisistestTwo"
>>>>> "AndThree"
>>>>> end
>>>>>
>>>>> The problem is the capital letter I wish to split each cell by is not
>>>>> consistently placed.
>>>>>
>>>>> I tried splitting using this code:
>>>>>
>>>>> split v1, p(upper(a-z))
>>>>> or
>>>>> split v1, p(upper(.))
>>>>>
>>>>> but this just generates an identical variable to v1.
>>>>>
>>>>> What I would like to do is create two new variables, so the first
>>>>> observation of my example would have "Test" in the first new variable and
>>>>> "One" in the second new variable. Suggestions would be greatly appreciated.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index