[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: AW: rationalizing multiple ids for the same name

From	"Martin Weiss" <[email protected]>
To	<[email protected]>
Subject	st: AW: rationalizing multiple ids for the same name
Date	Tue, 18 Aug 2009 09:28:23 +0200

<> 

Everything rides on what "the same name" means: Sometimes there is an "Inc"
at the end, sometimes not. If you are willing to assume that some part of
the string for "Name" needs to match, you can use the function -substr()- to
extract part of it, but I would imagine that to be rather hazardous. 

Subsequently, you can use -egen, mode()- to get the most frequent ticker
within the newly created "names".

Here is the second part:


***
clear*

input str20(Name Ticker)
"AOL Time Warner" "AOL"
"AOL Time Warner" "TW"
"AOL Time Warner" "TWX"
"AOL Time Warner" "TWX"
"AOL Time Warner" "T"
"Microsoft" "MS" 
end

compress

//trim the name to get rid of blanks
replace Name=trim(Name)

bys Name: egen freqtick= /* 
 */ mode(Ticker)
list, noobs
***



HTH
Martin

-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Dalhia
Gesendet: Dienstag, 18. August 2009 06:17
An: [email protected]
Betreff: st: rationalizing multiple ids for the same name

Dear Statalist, I have a question and I am hoping for some help. 

I have a very large dataset of companies over time, and I have two different
identifiers for these companies - name and ticker. The problem is that the
two identifiers are not always consistent. For instance:

Name, Ticker

AOL Time Warner, AOL
AOL Time Warner, TW
AOL Time Warner, TWX
AOL Time Warner Inc, TWX
AOL Time Warner Inc, T
Microsoft, MS

Basically the first 5 observations provide data about the same entity, AOL
Time Warner, and I need a way of recognizing that these are all the same
company. What I think will work is to check those names for which multiple
tickers exist, and use the ticker which appears in the dataset the most, and
put this most frequent ticker in a new variable New_Ticker. Here is how the
data should now look: 

Name, Ticker, New_Ticker

AOL Time Warner, AOL, TWX
AOL Time Warner, TW, TWX
AOL Time Warner, TWX, TWX
AOL Time Warner Inc, TWX, TWX
AOL Time Warner Inc, T, TWX
Microsoft, MS, MS

I am unable to figure out how to create this new variable New_ticker, which
basically has the most frequently used ticker in cases where the same name
has multiple tickers. I will be very grateful for any help on how to create
a variable which does the above.

Best
dalhia


      
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: RE: AW: rationalizing multiple ids for the same name
  - From: "Nick Cox" <[email protected]>

References:
- st: coefficients lost when mata code placed in eclass program
  - From: "Nelson, Carl" <[email protected]>
- st: rationalizing multiple ids for the same name
  - From: Dalhia <[email protected]>

Prev by Date: Re: st: Characteristics of median observation
Next by Date: st: Poisson and negative binomial regression
Previous by thread: st: rationalizing multiple ids for the same name
Next by thread: st: RE: AW: rationalizing multiple ids for the same name
Index(es):
- Date
- Thread