Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: How to get rid of leading and trailing letters and symbols?

From   Nick Cox <>
To   "''" <>
Subject   RE: st: How to get rid of leading and trailing letters and symbols?
Date   Wed, 26 Oct 2011 13:38:44 +0100

I agree with Uli in recommending regular expression machinery. Given these data, 

. l

     |                             example |
  1. |                /profile/?id=9596986 |
  2. | /profile/?id=9591886&reftype=detail |

-moss- (SSC) is, as mentioned very recently on this list, a wrapper for Stata's regex functions. It can give you more output than you need, but you just discard what you don't want. This finds numbers based on digits 0-9:  

. moss example, match(([0-9]+)) regex

. l

     |                             example   _count   _match1   _pos1 |
  1. |                /profile/?id=9596986        1   9596986      14 |
  2. | /profile/?id=9591886&reftype=detail        1   9591886      14 |

and there are all sorts of ways of subdividing according to position, with or without regular expressions. A criterion for number at the end is that the last character of the string is numeric which is 

. gen atend = !missing(real(substr(example,-1,1)))

. l

     |                             example   number~d   _count   _match1   _pos1   atend |
  1. |                /profile/?id=9596986    9596986        1   9596986      14       1 |
  2. | /profile/?id=9591886&reftype=detail                   1   9591886      14       0 |


Ulrich Kohler

you should get that using regular expressions (see help regexp). I don't
use regular expression very often in Stata, but in my favourite Editor,
Emacs, the regular expression to find a number of arbitrary length
would be 


which would store the number in \1. The Stata regular expression should
work very similar. 

Am Mittwoch, den 26.10.2011, 10:37 +0100 schrieb Ekaterina Hertog:

> I have got a dataset where the id variable is a part of a web-link. It 
> can contain letters followed by the id number: (e.g. 
> /profile/?id=9596986) or it can contain the id number in the middle 
> (e.g. /profile/?id=9591886&reftype=detail). I need to create a variable 
> which will only contain the number that is part of the id variable. I 
> also need to be able to distinguish between the cases where the number 
> is trailing vs. cases where it is in the middle. I looked at the advice 
> available on removing leading or trailing 0s in Stata 11 
> (, but in my 
> case I cannot actually specify the letters and symbols that lead or 
> trail so I am stuck. I use Stata 11.

*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index