From
Nick Cox <n.j.cox@durham.ac.uk>

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject
RE: st: How to get rid of leading and trailing letters and symbols?

Date
Wed, 26 Oct 2011 13:38:44 +0100

I agree with Uli in recommending regular expression machinery. Given these data, . l +-------------------------------------+ | example | |-------------------------------------| 1. | /profile/?id=9596986 | 2. | /profile/?id=9591886&reftype=detail | +-------------------------------------+ -moss- (SSC) is, as mentioned very recently on this list, a wrapper for Stata's regex functions. It can give you more output than you need, but you just discard what you don't want. This finds numbers based on digits 0-9: . moss example, match(([0-9]+)) regex . l +----------------------------------------------------------------+ | example _count _match1 _pos1 | |----------------------------------------------------------------| 1. | /profile/?id=9596986 1 9596986 14 | 2. | /profile/?id=9591886&reftype=detail 1 9591886 14 | +----------------------------------------------------------------+ and there are all sorts of ways of subdividing according to position, with or without regular expressions. A criterion for number at the end is that the last character of the string is numeric which is . gen atend = !missing(real(substr(example,-1,1))) . l +-----------------------------------------------------------------------------------+ | example number~d _count _match1 _pos1 atend | |-----------------------------------------------------------------------------------| 1. | /profile/?id=9596986 9596986 1 9596986 14 1 | 2. | /profile/?id=9591886&reftype=detail 1 9591886 14 0 | +-----------------------------------------------------------------------------------+ Nick n.j.cox@durham.ac.uk Ulrich Kohler you should get that using regular expressions (see help regexp). I don't use regular expression very often in Stata, but in my favourite Editor, Emacs, the regular expression to find a number of arbitrary length would be \(\[0-9]+\) which would store the number in \1. The Stata regular expression should work very similar. Am Mittwoch, den 26.10.2011, 10:37 +0100 schrieb Ekaterina Hertog: > I have got a dataset where the id variable is a part of a web-link. It > can contain letters followed by the id number: (e.g. > /profile/?id=9596986) or it can contain the id number in the middle > (e.g. /profile/?id=9591886&reftype=detail). I need to create a variable > which will only contain the number that is part of the id variable. I > also need to be able to distinguish between the cases where the number > is trailing vs. cases where it is in the middle. I looked at the advice > available on removing leading or trailing 0s in Stata 11 > (http://www.stata.com/support/faqs/data/leadingzeros.html), but in my > case I cannot actually specify the letters and symbols that lead or > trail so I am stuck. I use Stata 11. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

