Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Comparing strings within the same variable

Subject   st: Comparing strings within the same variable
Date   Thu, 23 Feb 2012 15:54:59 -0500

I'm working with data that has been typed in from product catalogs,
across years and countries.  The name used in the catalog can differ
within a given product ID category, so I know it's the same good as
long as the product ID is correct.  Many goods do not have ID's,
though, so the names are critical to matching.  Within the "name"
variable, there can be two very similar names, such as (1) billy bob
and (2) billy.  I would like to know how to detect whether all the
characters in observation (1) are contained in observation (2), or vice versa. I would consider these two names as being "close" and probably the same good, where as (1) billy and (2) krepko are very different names and are probably not the same good. Is there a way to use string functions to determine whether string A is contained in string B, but I'd want to refer to the string as belonging to a particular observation on a variable, I can't type in the strings by hand. There are 200,000 observations on several variables.

Thank you in advice for any help you can offer,

Marianne Baxter
Professor of Economics
Boston University
Boston, MA 02215

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index