Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Another question regarding string variables

From   Nick Cox <>
Subject   Re: st: Another question regarding string variables
Date   Wed, 27 Feb 2013 10:30:10 +0000

Often overlooked in this territory are -egen- functions for strings.
The tale starts with  -head()- and -tail()- from 1999:

STB-50  dm70  . . . . . . . . . . . . . . . . Extensions to generate, extended
        (help egenodd if installed) . . . . . . . . . . . . . . . .  N. J. Cox
        7/99    pp.9--17; STB Reprints Vol 9, pp.34--45
        24 additional egen functions presented; includes various string,
        data management, and statistical functions;
        many of the egen functions added to Stata 7

These were implemented in official Stata version 7 as different
_options_ to -egen-'s -ends()- function, but I still prefer my
original syntax.

egen first = ends(name), head
egen last = ends(name), tail

should do what Michael wants. Using -word()- repeatedly and regular
expressions are better tricks in general, but these exist tailor-made


[previous posts combined and edited]

Michael Stewart

> Thank you very much Kieran and Steve for the timely help
> All functions are working

Kieran McCaul

You can do this with regular expressions:

clear *
input str20 name
"John Howard R"
gen first = regexs(1) if regexm(name, "([a-zA-Z]+)[ ]*([a-zA-Z]+)[ ]*[a-zA-Z]")
gen last = regexs(2)+ " " +regexs(3)  if regexm(name, "([a-zA-Z]+)[
]*([a-zA-Z]+)[ ]*([a-zA-Z])")

or you could do it with the word() function:

clear *
input str20 name
"John Howard R"
gen first = word(name,1)
gen last = word(name,2)+ " " +word(name,3)

Steve Nakoneshny

>> I don't have access to the help file from my phone, but I'm fairly certain you should be able to extract *any* word from a string var using the -word- function.
>> Completely untested off the top of my head (with no recollection of the appropriate syntax):
>> g lname = word(yourvar,1)
>> g fname = word(yourvar,2)+word(yourvar,3)
>> The above is an inelegant means of approximating your needs. Adjusting for valid syntax would be a good start. I have no doubt that there are other string function solutions that would equally suffice.
>> If you are wedded to using -split-, you may with to insert a comma between words 1 & 2 of your string via -subinstr- and then proceed with -split yourvar,parse(,)-.

Michael Stewart

-word()- will give me the second word

But what I am trying to get is the first word and rest of the string
as second variable.

For example: John Howard R --> John  & (Howard R ) as two strings AND not as
John & Howard & R  separately  as three strings

Steve Nakoneshny

There is a string function called -word()- that will serve your
purpose. See -h word()- for more details.

Michael Stewart

>>>>> I am trying to find if there is a function to split a  string "Howard
>>>>> James R" --> "Howard"  & ("James R")
>>>>> If I use -split-, I would get Howard, James and R which is not what I want
>>>>> I want to split the string after the first word  into two string
>>>>> variables  first variable containing first word and second variable
>>>>> containing rest of the string
*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index