Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: tricky data merge/joinby problem


From   "Dimitriy V. Masterov" <[email protected]>
To   [email protected]
Subject   Re: st: tricky data merge/joinby problem
Date   Fri, 4 Mar 2011 12:35:24 -0500

Just to follow up on this for posterity, "panelized" merge approach
seems to be roughly twice as fast as the joinby method with fake data.

Simple code below.

#delimit;
version 11.1;
set more off;
capture trace off;
clear all;
macro drop _all;
set mem 5g;


tempfile file1 file2 joinbydata;

/* create panel data */
input
bgid	str6 dateyq	bgpop;
1	2010q1	100;
1	2010q2	105;
1	2010q3	106;
1	2010q4	125;
2	2010q1	110;
2	2010q2	115;
2	2010q3	116;
2	2010q4	135;
end;

save `file2';

clear;

/* create fraction data */
input
bid	bgid	fracpop;
11	1	.5;
12	1	.5;
21	2	.3;
22	2	.2;
23	2	.5;
end;

save `file1';


/* (1) joinsby approach */
timer on 1;
joinby bgid using `file2';
timer off 1;


sort bid dateyq;
list, sepby(bid);

save `joinbydata';


/* (2) panelize and merge approach */

use `file1', clear;

timer on 2;
expand 4;
sort bid;
bys bid: gen dateyq=string(_n);

strrec dateyq ("1"="2010q1") ("2"="2010q2") ("3"="2010q3")
("4"="2010q4"), replace;
/* list, sepby(bid); */

merge m:1 bgid dateyq using `file2';
timer off 2;
timer list 1;
timer list 2;

sort bid dateyq;
list, sepby(bid);

drop _merge;

/* Compare approached */

cf * using `joinbydata', all;

/* m:m merge fail */
use `file1', clear;
merge m:m bgid using `file2';
cf * using `joinbydata', all;

timer clear;
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index