Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Can you search within a dataset for a particular character?

From   David Kantor <>
Subject   Re: st: Can you search within a dataset for a particular character?
Date   Wed, 07 Mar 2007 15:00:00 -0500

At 02:44 PM 3/7/2007, wrote:
I have a dataset of names stored as string variables. Is there any way to search the dataset for all data containing a particular character, a hyphen for example? Also, is there any way to search the same dataset for all names containing a space (as in last names with a space between them ex. von cannon)? Thank you!
You can do it on a single variable, using the index function.

Say the string variable is named name. Say you want to look for a hyphen.

list name if index(name, "-")

or if you just want to identify them..
gen byte hashyphen = index(name, "-")

If you are looking for a space, just use " " in place of "-". But be sure that there are no leading or trailing spaces -- which you probably don't want to catch. That's usually not a problem, but you can use the trim function to deal with that, in case it is.

If you want to look for letters, you need to be careful about case. Say you want to look for "von". Do you also want to look for "Von" or "VON"? Are the values in name in lower, upper,, or mixed case? The best way to handle it, assuming you want to be case-insensitive is to make sure the string values (in name) as well as the text value you are searching for, are all in one case. If they are not already in one case, then you can do...
gen byte has_von = index(upper(name), "VON")

If your names are in more than one variable, you will need to search each variable and combine the results -- assuming that the value you are searching for does not span from one variable to another.



* For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index