Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: One to many merge


From   "Amy Dunbar" <[email protected]>
To   <[email protected]>
Subject   st: One to many merge
Date   Tue, 26 Jan 2010 09:50:32 -0500

I want to create a dataset, call it 'match-control' that groups together
all the observations in the same industry-year-decile combination, but
excludes the observation's own values for a particular variable. Thus if
there is only one firm in a decile, that firm will not be in the
combined sample. The resulting sample has obs= N * (N-1). If there are 3
firms, there will 3 *2 = 6 in the resulting match_control group for the
industry_year_decile.

Assume I have a dataset (match) that includes a unique identifier
(gvkey), industry (sic2), year, and decile (rroa), and the variable of
interest (dacc); I duplicate the dataset (control) renaming the vars as
c_gvkey, c_sic2, c_year, c_rroa, and c_dacc. For example, assume that
for the first decile I have the following for match and the same for
control, except that the var names have c_:

data = match			 	
N	gvkey	year	sic2	rroa	     dacc
1	1860	1991	13		0	0.11
2	2304	1991	13		0	-0.43
3	3949	1991	13		0	-0.31
4	5140	1991	13		0	-0.38
5	5188	1991	13		0	-0.28
6	7159	1991	13		0	-0.16
7	8104	1991	13		0	0.04
8	8838	1991	13		0	-0.07
9	8974	1991	13		0	0.08
10	12042	1991	13		0	-0.97
11	13183	1991	13		0	-0.28
12	15306	1991	13		0	-0.30
13	22398	1991	13		0	-0.21
14	23129	1991	13		0	-0.27
15	27199	1991	13		0	-0.17


Consider the SIC2=13 group. The first decile (coded 0) in the sample)
has 15 firms in 1991.  The resulting group is 15 * 14 = 210 firms in the
match-control sample. The merging for the first gvkey is the following:

data = match_control					
gvkey	year	sic2	dacc	rroa	c_dacc
1860	1991	13	0.11	0	-0.43
1860	1991	13	0.11	0	-0.31
1860	1991	13	0.11	0	-0.38
1860	1991	13	0.11	0	-0.28
1860	1991	13	0.11	0	-0.16
1860	1991	13	0.11	0	0.04
1860	1991	13	0.11	0	-0.07
1860	1991	13	0.11	0	0.08
1860	1991	13	0.11	0	-0.97
1860	1991	13	0.11	0	-0.28
1860	1991	13	0.11	0	-0.30
1860	1991	13	0.11	0	-0.21
1860	1991	13	0.11	0	-0.27
1860	1991	13	0.11	0	-0.17

In SAS, the following code does the trick.
proc sql;
      create table match_control as select 
      match.*, control.c_dacc
      from match, control
      where sic2 = c_sic2
      and year = c_year
      and rroa = c_rroa
      and gvkey ne c_gvkey;   /* the magic words */

I have tried merge and joinby, but I have had no success.  Can someone
tell me how to accomplish this one to many merge in Stata?

Thank you.

Amy Dunbar
University of Connecticut
School of Business
Department of Accounting
2100 Hillside Road Unit 1041
Storrs, CT 06269-1041

[email protected]

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index