Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: regular expression -split string an unknown number of times


From   Eric Booth <[email protected]>
To   "<[email protected]>" <[email protected]>
Subject   Re: st: regular expression -split string an unknown number of times
Date   Thu, 4 Aug 2011 18:08:46 +0000

<>

I couldn't your second regexs/regexm command to grab the middle name either, but you could use a loop to grab each proper case name in the string:

***********
clear

inp str20(var)
"JasonBJones"
"JohnPaulJones"
"GenePSmith"
"JayRyanMcArthur"
end

clonevar testvar = var

forval n = 1/5 {
gen teststring`n' = regexs(1) if regexm(testvar,"^([A-Z]+[a-z]+)")
replace testvar = subinstr(testvar, teststring`n', "", 1)
}

drop testvar
***********

Note that (1) I changed your regexm() to regexm("^([A-Z]+[a-z]+)") to deal with names with a middle initial instead of a middle name and (2) if you've got names with multiple capitalization like "McArthur", you'll need to put them back together.

- Eric
On Aug 4, 2011, at 12:54 PM, Rodini, Mark wrote:

> Greetings,
> 
> I have a simple question.  I have a list of strings representing names which lack any spaces and I'm trying to insert a space in the correct place or places to split out the names.
> For example, I might have:
> 
> JohnPaulJones
> 
> Which I'd like to turn into
> 
> John Paul Jones
> 
> 
> The rule is to insert a space before any upper case letter followed by a lower case.
> 
> gen teststring = regexs(1) if regexm(var,"^([A-Z][a-z]+)")
> 
> gives the first word.  I think I could do the following to get John Paul  
> 
> gen teststring = regexs(1) + " " + regexs(2) if regexm(var,"^([A-Z][a-z]+)([A-Z][a-z]+)")
> 
> The difficulty I'm having is that the number of subnames in a string is variable.  The example above has three subnames, but I might have one with two or four, etc.  I'm not sure how to program that.
> 
> Thanks for any help.
> Mark
> 
> 
> ----------------------------------------------
> Mark Rodini
> COMPASS LEXECON
> 1111 Broadway, Suite 1500
> Oakland, CA  94607
> 510-285-1258 (direct)
> 510-285-1240 (main)
> 510-285-1245 (fax)
> [email protected]
>  
> This e-mail and attachments may be confidential and protected by legal privilege.  If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the e-mail or any attachment is prohibited.  If you have received this e-mail in error, please notify us immediately by replying to the sender, and then delete this copy and the reply from your system.  Thank you for your cooperation.
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index