Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Merging files based on name and year
From
Joseph Monte <[email protected]>
To
statalist <[email protected]>
Subject
st: Merging files based on name and year
Date
Thu, 25 Aug 2011 09:24:16 +0100
Dear Statalisters,
I would appreciate help on this issue.
The file below is the master file containing names of companies and
years. ABC Inc is one company, XYZ Corp is another company, PNG LLC is
a third company. I have a total of 1100 different companies. As shown
below, sometimes two or more companies are listed in the same field
(there are a maximum of 5 companies listed in the same field). The
year column has only one year for each observation. There are a total
of 800 observations in this file.
name year
ABC Inc 1986
XYZ Corp 1994
ABC Inc XYZ Corp 2001
PNG LLC 2005
XYZ Corp PNG LLC 2007
I have a second file with data in the following format. The 1100
companies are listed as shown below. YR8084 means the years 1980-1984,
YR8591 means the years 1985-1991, and so on. The numbers below each
year are scores assigned to each company during a certain period. For
example, ABC Inc is assigned a score of 6 during 1980-1984, 7 from
1985-1991, and 9 from 2001-2004. ABC Inc is not assigned a score
during other periods. Scores range from 1 to 9 and may be up to 3
decimal places.
name YR8084 YR8591 YR9200 YR0104 YR0507 YR0809
ABC Inc 6 7 9
XYZ Corp 2 5 6 6
PNG LLC 7 7 7
7 7
I want the master file to include a column with scores as shown below.
For example, ABC Inc gets a score of 7 in 1986, XYZ Corp gets a score
of 5 in 1994. For observations with two or more names, I want a simple
average of scores. For example, for ABC Inc XYZ Corp, the score will
be (9+6)/2=7.5. If a company has a year in the master file, then it
definitely has a score for that year (i.e. time period) in the second
file.
name year score
ABC Inc 1986 7
XYZ Corp 1994 5
ABC Inc XYZ Corp 2001 7.5
PNG LLC 2005 7
XYZ Corp PNG LLC 2007 6.5
I am using Stata 12. I expect I would need the -regexm()- command to
split the company names, then -reshape- to get all company names one
below the other and then -merge-. Since there are 1100 companies, I
would need some kind of a loop to use the -regexm() command. I am
having trouble writing the code.
Thanks,
Joe
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/