[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Re: string

From	Viktor Slavtchev <[email protected]>
To	[email protected]
Subject	Re: st: Re: string
Date	Mon, 17 Mar 2008 18:08:04 +0100

Michael and Vladimir thank you very much.
viktor


Michael Blasnik wrote:

...
You may want to try out -reclink- available from SSC (-findit reclink-) . It uses the bigram string comparator to assess the degree of match between fields. If you have lots of short strings, you may want to adjust the minbigram option downward to ensure more matches.

Michael Blasnik (author of reclink)

----- Original Message ----- From: "Viktor Slavtchev" <[email protected]>
To: <[email protected]>
Sent: Monday, March 17, 2008 9:48 AM
Subject: st: string

Dear list,
I want to merge two files where the common variable is a string (names of cities). However, there are non systematic differences in the notions.
For example, you can find: "Berlin" in the first file but " Berlin" in the second. In other cases you can find "Rome" and "Roma,IT". Or "Paris, FR" and "Paris/FR"
I was tot able to find any systematics in the notion. I have over 40.000 unique observations.
How can I search for substrings in Stata? For example, for "*Rom*", the largest match between "Rome" and "Roma,IT".
I think this could help to solve some problems. Or does anybody know a better way to deal with such kind of 'bad' data?
thanks
viktor
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: string
  - From: Viktor Slavtchev <[email protected]>
- st: Re: string
  - From: "Michael Blasnik" <[email protected]>

Prev by Date: st: RE: RE: sjlatex under MiKTeX 2.4 and above
Next by Date: Re: st: graph piecewise constant baseline hazard
Previous by thread: st: Re: string
Next by thread: Re: st: Converting date
Index(es):
- Date
- Thread