Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: copy part of a string

From   ChrisAnsen <>
Subject   st: copy part of a string
Date   Sat, 15 Oct 2011 18:21:39 +0200

Dear all

I run into an issue with STATA today.

I have a datalist with over 1000 string variables in the following type

1. "M200B + M201 + B001"
2.  "M200B + M201"
3.  "M200 + M300"
4. ...
5. and so on.

Now I want to read the first part of the string, example: "M200B" and
insert it in a new column and then read the second and the third part of
it if applicable.

I am doing this by using the command:

gen code1_1 = regexs(1) if regexm(code1, "(([a-zA-Z]+[0-9]+[0-9]+[0-9][a-zA-Z])|([a-zA-Z]+[0-9]+[0-9]+[0-9])")

Now this gets my what I want, having what is before the + sign.

Now I want what is after the + sign and I am doing it be using the
following command:

gen code1_2 = regexs(2) if regexm(code1, "(([+ ]+[a-zA-Z]+[0-9]+[0-9]+[0-9]+[a-zA-Z]))")

This gives the value if it is in the form of "M200B" and by adding an OR
and transforming it to:

gen code1_2 = regexs(2) if regexm(code1, "(([+ ]+[a-zA-Z]+[0-9]+[0-9]+[0-9]+[a-zA-Z])|([+ ]+[a-zA-Z]+[0-9]+[0-9]+[0-9])")

I am getting an error that it is outside range, or something similar.

Can someone tell me where I am making the mistake, or if there is an
other way to do it?

I though of using a dummy variable as a mid-step but I do not like the
idea because later when I have six variable "M2008 + .......+M20" it
will be messy, and it should be durable on the "correct" way.

Also I know how to make it more tide up by using [0-9] for example so
please do not mention any of those advices :)

Thank you all in advance

Best regards
Christina Christiansen, DK

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index