Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: How to get rid of leading and trailing letters and symbols?

From	Nick Cox <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	RE: st: How to get rid of leading and trailing letters and symbols?
Date	Wed, 26 Oct 2011 13:38:44 +0100

I agree with Uli in recommending regular expression machinery. Given these data, 

. l

     +-------------------------------------+
     |                             example |
     |-------------------------------------|
  1. |                /profile/?id=9596986 |
  2. | /profile/?id=9591886&reftype=detail |
     +-------------------------------------+

-moss- (SSC) is, as mentioned very recently on this list, a wrapper for Stata's regex functions. It can give you more output than you need, but you just discard what you don't want. This finds numbers based on digits 0-9:  

. moss example, match(([0-9]+)) regex

. l

     +----------------------------------------------------------------+
     |                             example   _count   _match1   _pos1 |
     |----------------------------------------------------------------|
  1. |                /profile/?id=9596986        1   9596986      14 |
  2. | /profile/?id=9591886&reftype=detail        1   9591886      14 |
     +----------------------------------------------------------------+

and there are all sorts of ways of subdividing according to position, with or without regular expressions. A criterion for number at the end is that the last character of the string is numeric which is 

. gen atend = !missing(real(substr(example,-1,1)))

. l

     +-----------------------------------------------------------------------------------+
     |                             example   number~d   _count   _match1   _pos1   atend |
     |-----------------------------------------------------------------------------------|
  1. |                /profile/?id=9596986    9596986        1   9596986      14       1 |
  2. | /profile/?id=9591886&reftype=detail                   1   9591886      14       0 |
     +-----------------------------------------------------------------------------------+


Nick 
[email protected] 

Ulrich Kohler

you should get that using regular expressions (see help regexp). I don't
use regular expression very often in Stata, but in my favourite Editor,
Emacs, the regular expression to find a number of arbitrary length
would be 

\(\[0-9]+\)

which would store the number in \1. The Stata regular expression should
work very similar. 


Am Mittwoch, den 26.10.2011, 10:37 +0100 schrieb Ekaterina Hertog:

> I have got a dataset where the id variable is a part of a web-link. It 
> can contain letters followed by the id number: (e.g. 
> /profile/?id=9596986) or it can contain the id number in the middle 
> (e.g. /profile/?id=9591886&reftype=detail). I need to create a variable 
> which will only contain the number that is part of the id variable. I 
> also need to be able to distinguish between the cases where the number 
> is trailing vs. cases where it is in the middle. I looked at the advice 
> available on removing leading or trailing 0s in Stata 11 
> (http://www.stata.com/support/faqs/data/leadingzeros.html), but in my 
> case I cannot actually specify the letters and symbols that lead or 
> trail so I am stuck. I use Stata 11.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: sgini and negative incomes
  - From: "[email protected]" <[email protected]>
- st: RE: sgini and negative incomes
  - From: Nick Cox <[email protected]>
- st: How to get rid of leading and trailing letters and symbols?
  - From: Ekaterina Hertog <[email protected]>
- Re: st: How to get rid of leading and trailing letters and symbols?
  - From: Ulrich Kohler <[email protected]>

Prev by Date: st: Re: Unable to install "ivreg2" in Stata 12.
Next by Date: st: stata data editor
Previous by thread: Re: st: How to get rid of leading and trailing letters and symbols?
Next by thread: Re: st: How to get rid of leading and trailing letters and symbols?
Index(es):
- Date
- Thread