Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: problem with split command


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Re: problem with split command
Date   Wed, 29 Feb 2012 09:20:35 +0000

Joseph is naturally right. In addition,

1. The help for -split- gives an example in which parsing is on ")"
but it is desired to keep the ")" and the answer is simply that if you
use -split- in this way you must put them back yourself. This is
similar to your problem.

2. The main point is that -split- is not designed directly for this
kind of problem because when it was introduced there were already
several ways to use existing string functions [N.B., not commands] to
solve that kind of problem easily. Joseph has mentioned one. Here's
another

gen numeral = real(substr(state_name, -4, 4))
gen state = substr(state_name, 1, length(state_name) - 4)

Once -numeral- exists,

gen state = subinstr(state_name, numeral, "", .)

is another way to do it.

Here's another

gen numeral = substr(state_name, strpos(state_name, "2"), .)
gen state = substr(state_name, 1, strpos(state_name, "2") - 1)

Nick

On Wed, Feb 29, 2012 at 3:49 AM, Joseph Coveney <[email protected]> wrote:

> Forgot to mention:  for this year's survey and afterward, try the alternative below.  You can use Stata's regular expressions, too.
>
>
> . input str30 state_name
>
>                         state_name
>  1. "Andhra2012"
>  2. "Arunachal2012"
>  3. "Assam2012"
>  4. "Bihar2012"
>  5. "UttarPradesh2012"
>  6. end
>
> .
> . generate byte first_numeral = indexnot(state_name, "`c(alpha)'`c(ALPHA)'")
>
> . generate long year = real(substr(state_name, first_numeral, .))
>
> . replace state_name = substr(state_name, 1, first_numeral - 1)
> (5 real changes made)
>
> .
> . list, noobs separator(0) abbreviate(20)
>
>  +-------------------------------------+
>  |   state_name   first_numeral   year |
>  |-------------------------------------|
>  |       Andhra               7   2012 |
>  |    Arunachal              10   2012 |
>  |        Assam               6   2012 |
>  |        Bihar               6   2012 |
>  | UttarPradesh              13   2012 |
>  +-------------------------------------+
>
> .
> . exit
>
> end of do-file

Joseph Coveney

You're almost there:  finish the job by concatenating "2" and statename2:

generate int year = real("2" + statename2)


Prakash Singh wrote:

I need help on using -split- command. I am working with Stata 10.
I am working with survey data of Indian states, In the survey data the
variable state_name are put jointly with year in which the state is
surveyed, in this case 2005 to 2009. So the state_name variable looks
like...
Andhra2006
Arunachal2005
Assam2006
Bihar2007
UttarPradesh2009

and so on.
Now I would like to create two separate variables out of it i.e.
state_name and year_survey.

I have used the following command
split state_name, pares(2) gen(statename)

But the problem I am facing is the statename2 variable which is
actually year variable is coming without 2 i.e. 005, 006 etc.

Please suggest me as I have read the -split- help and Statalist postings
on -split- but could not work it out.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index