Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Elena Vidal <maria.vidal@duke.edu> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Re: How do I split my string variable by capital letters? |

Date |
Sat, 26 Nov 2011 18:30:29 +0000 |

Thanks, Joseph. This really worked! I had to do it backwards though: first replace the carriage-return with another character (some lines in the middle of the observation were empty and splitting by carriage return did not account for those missing lines that were indeed valuable information). Thanks so much! Elena -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Joseph Coveney Sent: Wednesday, November 23, 2011 10:59 PM To: statalist@hsphsun2.harvard.edu Subject: st: Re: How do I split my string variable by capital letters? Elena Vidal wrote: I'm having a bit of a problem splitting a string variable. An example of the variable reads: var12 "Startup/Seed Early Stage Expansion Expansion Expansion" This corresponds to a single cell. That means: all 5 lines appear in 1 cell in my data, and I want to split it up so that each line is a different variable. I've tried splitting it with this code: split var12, gen(var12b) but that didn't work. I'd appreciate the help trying to sort this out! -------------------------------------------------------------------------------- Are you saying that there are carriage-return/line-feed characters in the cell? If so, then you can still use -split-. You just need to specify the -parse()- option. See the illustration below. I've seen this before when retrieving data from prettified Excel workbooks. Before using -split-, you should double-check what you have in there that's delimiting lines. Sometimes it's only a line-feed character or a carriage-return character, and not both. An earlier thread this week (about the perennial problem of ASCII character 160) contains recommendations that are applicable to identifying nonprinting characters in your string variables. Take a look at that thread for how to determine what the line delimiter is in your dataset. Joseph Coveney . * Set up . set obs 1 obs was 0, now 1 . input str54 var12 var12 1. "Startup/Seed?Early Stage?Expansion?Expansion?Expansion" . replace var12 = subinstr(var12, "?", "`=char(13)'`=char(10)'", .) var12 was str54 now str58 (1 real change made) . tempfile tmpfil0 . outsheet using "`tmpfil0'", names quote . . * Looks like this . type "`tmpfil0'" var12 "Startup/Seed Early Stage Expansion Expansion Expansion" . . * Solution . split var12, generate(var12b) parse("`=char(13)'`=char(10)'") variables created as string: var12b1 var12b2 var12b3 var12b4 var12b5 . list var12b? +----------------------------------------------------------------+ | var12b1 var12b2 var12b3 var12b4 var12b5 | |----------------------------------------------------------------| 1. | Startup/Seed Early Stage Expansion Expansion Expansion | +----------------------------------------------------------------+ . . . exit end of do-file * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: How do I split my string variable by capital letters?***From:*Elena Vidal <maria.vidal@duke.edu>

**st: Re: How do I split my string variable by capital letters?***From:*"Joseph Coveney" <jcoveney@bigplanet.com>

- Prev by Date:
**Re: st: bootstrap weights** - Next by Date:
**st: RE: Re: pwcompare** - Previous by thread:
**st: Re: How do I split my string variable by capital letters?** - Next by thread:
**st: Combining variables without adding** - Index(es):