Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Split a variable that has no markers


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Split a variable that has no markers
Date   Thu, 14 Oct 2004 20:18:50 +0100

You are correct about the -split- command (not option). 
It doesn't do what you want, because the tools for 
what you want had been in official Stata for a long 
time before -split- was a gleam in anybody's eye. 

And -index()-, -substr()- and -length()- will do what you want. 

As I understand it, you want, for example, the three
digits before the "-" to be variable 3. You 
just need to work through the steps. 

Where is the "-"?  

index(original, "-") 

Where is 3 characters before that? 

index(original,"-") - 3 

How much do you want to extract? 

3 characters

So 

gen var4 = substr(original,index(original,"-")-3,3) 

You can test this with an example: 

di substr("9002346-A",index("9002346-A","-")-3, 3) 

In fact, it seems that you want just 

substr(original,-5,3) 

where -5 means "count 5 backwards from the end". 

Similarly var4 looks like substr(original,-1,1) 
and var1 looks like 

substr(original,1,length(original)-8) 

In addition to the manuals there was some discussion
in 

On getting functions to do the work. Stata Journal 
2(4):411--427 (2002) 

Nick 
[email protected] 

Julia A. Gamas

> I have an alfanumeric (string variable) such as 9002346-A or 
> 15120657-4.  I
> need to split this up into groups of smaller digits as follows:
> 
> Original            variable 1          variable 2          variable 3
> variable 4
> 9002346-A        9                        002                   346
> A
> 15120657-4      15                       120                   657
> 4
> 
> The split option seems to only allow me to do this if there 
> are spaces (or
> other markers, such as commas) within that variable, ie. 9 
> 002 346-A or
> 9,002,346-A, which is not my case).
> I can't find an option within "strfun" which may help unless I've
> misunderstood the instructions for "char(n)" and "index".
> Also, note that 9 and 15 are different in size, adding a bit 
> of complication
> to the issue.
> What I want to do can be done in excel (for smaller files) using
> "left(variable location, number of characters)" and "right" 
> functions.  Does
> anybody know if Stata has functions akin to "left" and 
> "right" from excel?
> I can't seem to find any on the online help or stata archives.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index