Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Eric Booth <ebooth@ppri.tamu.edu> |

To |
"<statalist@hsphsun2.harvard.edu>" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: regular expression -split string an unknown number of times |

Date |
Thu, 4 Aug 2011 18:08:46 +0000 |

<> I couldn't your second regexs/regexm command to grab the middle name either, but you could use a loop to grab each proper case name in the string: *********** clear inp str20(var) "JasonBJones" "JohnPaulJones" "GenePSmith" "JayRyanMcArthur" end clonevar testvar = var forval n = 1/5 { gen teststring`n' = regexs(1) if regexm(testvar,"^([A-Z]+[a-z]+)") replace testvar = subinstr(testvar, teststring`n', "", 1) } drop testvar *********** Note that (1) I changed your regexm() to regexm("^([A-Z]+[a-z]+)") to deal with names with a middle initial instead of a middle name and (2) if you've got names with multiple capitalization like "McArthur", you'll need to put them back together. - Eric On Aug 4, 2011, at 12:54 PM, Rodini, Mark wrote: > Greetings, > > I have a simple question. I have a list of strings representing names which lack any spaces and I'm trying to insert a space in the correct place or places to split out the names. > For example, I might have: > > JohnPaulJones > > Which I'd like to turn into > > John Paul Jones > > > The rule is to insert a space before any upper case letter followed by a lower case. > > gen teststring = regexs(1) if regexm(var,"^([A-Z][a-z]+)") > > gives the first word. I think I could do the following to get John Paul > > gen teststring = regexs(1) + " " + regexs(2) if regexm(var,"^([A-Z][a-z]+)([A-Z][a-z]+)") > > The difficulty I'm having is that the number of subnames in a string is variable. The example above has three subnames, but I might have one with two or four, etc. I'm not sure how to program that. > > Thanks for any help. > Mark > > > ---------------------------------------------- > Mark Rodini > COMPASS LEXECON > 1111 Broadway, Suite 1500 > Oakland, CA 94607 > 510-285-1258 (direct) > 510-285-1240 (main) > 510-285-1245 (fax) > mrodini@compasslexecon.com > > This e-mail and attachments may be confidential and protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by replying to the sender, and then delete this copy and the reply from your system. Thank you for your cooperation. > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: regular expression -split string an unknown number of times***From:*"Rodini, Mark" <mrodini@compasslexecon.com>

- Prev by Date:
**st: regular expression -split string an unknown number of times** - Next by Date:
**Re: st: Making vce(robust) the default** - Previous by thread:
**st: regular expression -split string an unknown number of times** - Next by thread:
**Re: st: regular expression -split string an unknown number of times** - Index(es):