Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: RE: AW: Search for string values in dataset??


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: RE: AW: Search for string values in dataset??
Date   Mon, 21 Feb 2011 17:42:09 +0000

These answers focused on looking in a single named variable. An extension of the problem is to find which string variables contain any such values. 

Using -findname- from SJ: 

. findname, any(strpos(@, "GmbH"))

which you could extend using -lower()- if desired. 

-strpos()- is the modern name for -index()-. 

Note that -strmatch()- and -regexm()- are functions, and not commands. 

Nick 
n.j.cox@durham.ac.uk 

Eric Booth

One way to match insensitive to case would be to create a lowercase (and (optionally) temporary) version of the string variable and match on that, so:

*********************!
clear
inp str10(var1)
"GmbH7UuIZ"
"GMbH7UuIZ"
"gmbh7Uuiz"
end

//1.  from Markus and Junlin//
g one = 1 if strmatch(var1, "*GmbH*")
g byte two = regexm(var1, "GmbH")
list var1 if regexm(var1, "GmbH")

//2.  Another option:  index //
g str10 three = var1 if index(var1, "GmbH")
g str10 four = var1 if index(var1, "gmbh")
list var1 if index(var1, "GmbH")

//3.  case insensitive - lower case matches //
tempvar var1_lower 
g `var1_lower' = lower(var1)
g str10 five = var1 if index(`var1_lower', "gmbh") 
/* could evaluate to 1 if matched (instead of var1 contents) */
l
*********************!

Liao, Junlin 

> Continue on this topic, anyone has a good way to make those two commands case insensitive? Thanks,

Liao, Junlin
 
> Another command is -regexm-
> 
> gen byte GmbH_Match = regexm(variable, "GmbH")
> 
> If you simply want to list the entries:
> 
> list variable if regexm(variable, "GmbH")
 
Wiemann, Markus
 
> try the -strmatch- command.
> For example:
> gen GmbH = 1 if strmatch(variablename, "*GmbH*")
 
miyu Lee
 
> Is there ANY way to search for specific string values in a dataset with string variables? For example: I am searching for all entries showing the part "GmbH" in a vector with firm names. I have a bad feeling about this!

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index