Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: tricky data merge/joinby problem
From 
 
"Dimitriy V. Masterov" <[email protected]> 
To 
 
[email protected] 
Subject 
 
Re: st: tricky data merge/joinby problem 
Date 
 
Fri, 4 Mar 2011 12:35:24 -0500 
Just to follow up on this for posterity, "panelized" merge approach
seems to be roughly twice as fast as the joinby method with fake data.
Simple code below.
#delimit;
version 11.1;
set more off;
capture trace off;
clear all;
macro drop _all;
set mem 5g;
tempfile file1 file2 joinbydata;
/* create panel data */
input
bgid	str6 dateyq	bgpop;
1	2010q1	100;
1	2010q2	105;
1	2010q3	106;
1	2010q4	125;
2	2010q1	110;
2	2010q2	115;
2	2010q3	116;
2	2010q4	135;
end;
save `file2';
clear;
/* create fraction data */
input
bid	bgid	fracpop;
11	1	.5;
12	1	.5;
21	2	.3;
22	2	.2;
23	2	.5;
end;
save `file1';
/* (1) joinsby approach */
timer on 1;
joinby bgid using `file2';
timer off 1;
sort bid dateyq;
list, sepby(bid);
save `joinbydata';
/* (2) panelize and merge approach */
use `file1', clear;
timer on 2;
expand 4;
sort bid;
bys bid: gen dateyq=string(_n);
strrec dateyq ("1"="2010q1") ("2"="2010q2") ("3"="2010q3")
("4"="2010q4"), replace;
/* list, sepby(bid); */
merge m:1 bgid dateyq using `file2';
timer off 2;
timer list 1;
timer list 2;
sort bid dateyq;
list, sepby(bid);
drop _merge;
/* Compare approached */
cf * using `joinbydata', all;
/* m:m merge fail */
use `file1', clear;
merge m:m bgid using `file2';
cf * using `joinbydata', all;
timer clear;
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/