Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: How do I split my string variable by capital letters?


From   "Joseph Coveney" <[email protected]>
To   <[email protected]>
Subject   st: Re: How do I split my string variable by capital letters?
Date   Thu, 24 Nov 2011 12:58:48 +0900

Elena Vidal wrote:

I'm having a bit of a problem splitting a string variable.

An example of the variable reads:
var12
"Startup/Seed
Early Stage
Expansion
Expansion
Expansion"

This corresponds to a single cell. That means: all 5 lines appear in 1 cell in
my data, and I want to split it up so that each line is a different variable. 

I've tried splitting it with this code:

split var12, gen(var12b)

but that didn't work. 

I'd appreciate the help trying to sort this out!

--------------------------------------------------------------------------------

Are you saying that there are carriage-return/line-feed characters in the cell?
If so, then you can still use -split-.  You just need to specify the -parse()-
option.  See the illustration below.  I've seen this before when retrieving data
from prettified Excel workbooks.  

Before using -split-, you should double-check what you have in there that's
delimiting lines.  Sometimes it's only a line-feed character or a
carriage-return character, and not both.  An earlier thread this week (about the
perennial problem of ASCII character 160) contains recommendations that are
applicable to identifying nonprinting characters in your string variables.  Take
a look at that thread for how to determine what the line delimiter is in your
dataset.

Joseph Coveney

. * Set up
. set obs 1
obs was 0, now 1

. input str54 var12

                                                      var12
  1. "Startup/Seed?Early Stage?Expansion?Expansion?Expansion"

. replace var12 = subinstr(var12, "?", "`=char(13)'`=char(10)'", .)
var12 was str54 now str58
(1 real change made)

. tempfile tmpfil0

. outsheet using "`tmpfil0'", names quote

. 
. * Looks like this
. type "`tmpfil0'"
var12
"Startup/Seed
Early Stage
Expansion
Expansion
Expansion"

. 
. * Solution
. split var12, generate(var12b) parse("`=char(13)'`=char(10)'")
variables created as string: 
var12b1  var12b2  var12b3  var12b4  var12b5

. list var12b?

     +----------------------------------------------------------------+
     |      var12b1       var12b2     var12b3     var12b4     var12b5 |
     |----------------------------------------------------------------|
  1. | Startup/Seed   Early Stage   Expansion   Expansion   Expansion |
     +----------------------------------------------------------------+

. 
. 
. exit

end of do-file


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index