[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: identifying (not counting) observations repeated across observations

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: RE: identifying (not counting) observations repeated across observations
Date	Tue, 25 Aug 2009 14:06:46 +0100

This raises essentially the same issues, and thus the same answers, as
your earlier question this month, which provoked comments about soundex,
etc. 

Nick 
[email protected] 

Dalhia

I have a question and have tried countmatch and also the FAQ on
identifying distinct observations across variables, but I need something
a little bit different and can't figure out how to get it. 

Here is the problem. I have a set of ids matched with a variety of names
all referring to the same entity. Here is how the data looks:

id, name1, name2, name3

A1, AOL, AOL Time Warner, AOL
A2, Time, Time Inc, AOL
A3, Microsoft, MS Office, Micsoft
A4, AL, AOL, Bla

I need to somehow recognize that A1 (and all the names attached to it)
and A2 (and all the names attached to it) and A4 (and all the names
attached to it) refer to the same entity. Is there a way to form a new
variable, say "same_entity", which will identify those observations
where one/more of the names reappear. Here is what I would like to get:

id, name1, name2, name3, same_entity

A1, AOL, AOL Time Warner, A O L, 1
A2, Time, Time Inc, AOL, 1
A3, Microsoft, MS Office, Micsoft, 2
A4, AL, AOL, Bla, 1


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: identifying (not counting) observations repeated across observations
  - From: Dalhia <[email protected]>

Prev by Date: st: dot graph
Next by Date: RE: st: "sort" over different variables - thanks Nick and Martin
Previous by thread: st: identifying (not counting) observations repeated across observations
Next by thread: st: identifying (not counting) observations repeated across observations
Index(es):
- Date
- Thread