Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: identifying letters in a string variable


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: identifying letters in a string variable
Date   Thu, 1 Sep 2005 00:13:42 +0100

A loose test of whether a string variable has only 
numbers is that -real(strvar)- is not missing, 
always remembering the possibility of missing 
values. 

real("")

and 

real(".") 

both return numeric missing. 

In the -egenmore- package on SSC there is 
a function -sieve()- which may help here. 

sieve(strvar) , { keep(classes) | char(chars) | omit(chars) } 
selects characters from strvar according to a specified criterion 
and generates a new string variable containing only those characters. 
This may be done in three ways. First, characters are classified using
the keywords alphabetic (any of a-z or A-Z), numeric (any of 0-9), 
space or other. keep() specifies one or more of those classes: 
keywords may be abbreviated by as little as one letter. Thus keep(a n) 
selects alphabetic and numeric characters and omits spaces and other 
characters. Note that keywords must be separated by spaces. Alternatively, 
char() specifies each character to be selected or omit() specifies each
character to be omitted. Thus char(0123456789.) selects numeric 
characters and the stop (presumably as decimal point); omit(" ") strips 
spaces and omit(`"""') strips double quotes. (Stata 7 required.) 

So you could look at a string variable like this. 

egen N = sieve(strvar), keep(n) 
capture assert N == strvar 
if _rc { 
	// characters present 
	egen S = sieve(strvar), keep(a) 
	capture assert S == strvar 
	if _rc { 
		// must be a mixture
		<code for this case> 
	} 
	else { 
		// must be all string
		<code for this case> 
	}
else { 
	// must be all numeric 
	<code for this case> 
} 
drop N S 

Nick 
n.j.cox@durham.ac.uk 

TEWODAJ MOGUES

> I looked through the string functions to try to find out 
> which variable 
> values of a string variable has letters plus numbers, only 
> letters, and 
> only numbers, but didn't come up with anything. E.g. suppose i wanted 
> to create a categorical variable that takes on 1 when stringvar has 
> only numbers, 2 if a mix of numbers and letter, and 3 if only letters:
> 
> stringvar catvar
> 1           1
> 12          1
> id14        2
> run         3
> 5K          2
> SPRINT      3 
 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index