Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Re: How do I split my string variable by capital letters?


From   Elena Vidal <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: RE: Re: How do I split my string variable by capital letters?
Date   Sat, 26 Nov 2011 18:30:29 +0000

Thanks, Joseph.

This really worked! I had to do it backwards though: first replace the carriage-return with another character (some lines in the middle of the observation were empty and splitting by carriage return did not account for those missing lines that were indeed valuable information). 

Thanks so much!
Elena


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Joseph Coveney
Sent: Wednesday, November 23, 2011 10:59 PM
To: [email protected]
Subject: st: Re: How do I split my string variable by capital letters?

Elena Vidal wrote:

I'm having a bit of a problem splitting a string variable.

An example of the variable reads:
var12
"Startup/Seed
Early Stage
Expansion
Expansion
Expansion"

This corresponds to a single cell. That means: all 5 lines appear in 1 cell in my data, and I want to split it up so that each line is a different variable. 

I've tried splitting it with this code:

split var12, gen(var12b)

but that didn't work. 

I'd appreciate the help trying to sort this out!

--------------------------------------------------------------------------------

Are you saying that there are carriage-return/line-feed characters in the cell?
If so, then you can still use -split-.  You just need to specify the -parse()- option.  See the illustration below.  I've seen this before when retrieving data from prettified Excel workbooks.  

Before using -split-, you should double-check what you have in there that's delimiting lines.  Sometimes it's only a line-feed character or a carriage-return character, and not both.  An earlier thread this week (about the perennial problem of ASCII character 160) contains recommendations that are applicable to identifying nonprinting characters in your string variables.  Take a look at that thread for how to determine what the line delimiter is in your dataset.

Joseph Coveney

. * Set up
. set obs 1
obs was 0, now 1

. input str54 var12

                                                      var12
  1. "Startup/Seed?Early Stage?Expansion?Expansion?Expansion"

. replace var12 = subinstr(var12, "?", "`=char(13)'`=char(10)'", .)
var12 was str54 now str58
(1 real change made)

. tempfile tmpfil0

. outsheet using "`tmpfil0'", names quote

. 
. * Looks like this
. type "`tmpfil0'"
var12
"Startup/Seed
Early Stage
Expansion
Expansion
Expansion"

. 
. * Solution
. split var12, generate(var12b) parse("`=char(13)'`=char(10)'") variables created as string: 
var12b1  var12b2  var12b3  var12b4  var12b5

. list var12b?

     +----------------------------------------------------------------+
     |      var12b1       var12b2     var12b3     var12b4     var12b5 |
     |----------------------------------------------------------------|
  1. | Startup/Seed   Early Stage   Expansion   Expansion   Expansion |
     +----------------------------------------------------------------+

. 
. 
. exit

end of do-file


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index