Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: copy part of a string


From   "Dimitriy V. Masterov" <dvmaster@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: copy part of a string
Date   Sat, 15 Oct 2011 13:28:19 -0400

Chris,

Suppose your variable was called x. Then the command would be

split x, parse(" + ")

You can loop over your variables with split if you want to do all of them.

This will not work if you have cases without spaces, like "M200+M300".
Then you may need to split x, parse("+"), and then use -trim- to get
rid of the leading and trailing blanks if they exist.

DVM

On Sat, Oct 15, 2011 at 12:21 PM, ChrisAnsen <lakridstina@gmail.com> wrote:
> Dear all
>
> I run into an issue with STATA today.
>
> I have a datalist with over 1000 string variables in the following type
>
> 1. "M200B + M201 + B001"
> 2.  "M200B + M201"
> 3.  "M200 + M300"
> 4. ...
> 5. and so on.
>
> Now I want to read the first part of the string, example: "M200B" and
> insert it in a new column and then read the second and the third part of
> it if applicable.
>
> I am doing this by using the command:
>
> gen code1_1 = regexs(1) if regexm(code1,
> "(([a-zA-Z]+[0-9]+[0-9]+[0-9][a-zA-Z])|([a-zA-Z]+[0-9]+[0-9]+[0-9])")
>
>
> Now this gets my what I want, having what is before the + sign.
>
> Now I want what is after the + sign and I am doing it be using the
> following command:
>
> gen code1_2 = regexs(2) if regexm(code1, "(([+
> ]+[a-zA-Z]+[0-9]+[0-9]+[0-9]+[a-zA-Z]))")
>
> This gives the value if it is in the form of "M200B" and by adding an OR
> and transforming it to:
>
> gen code1_2 = regexs(2) if regexm(code1, "(([+
> ]+[a-zA-Z]+[0-9]+[0-9]+[0-9]+[a-zA-Z])|([+ ]+[a-zA-Z]+[0-9]+[0-9]+[0-9])")
>
> I am getting an error that it is outside range, or something similar.
>
> Can someone tell me where I am making the mistake, or if there is an
> other way to do it?
>
> I though of using a dummy variable as a mid-step but I do not like the
> idea because later when I have six variable "M2008 + .......+M20" it
> will be messy, and it should be durable on the "correct" way.
>
> Also I know how to make it more tide up by using [0-9] for example so
> please do not mention any of those advices :)
>
> Thank you all in advance
>
> Best regards
> Christina Christiansen, DK
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index