Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Different results for 1:1-merging using the same variables (int & string)

From   "Hofbaur, Ulrich" <>
To   "" <>
Subject   st: Different results for 1:1-merging using the same variables (int & string)
Date   Wed, 13 Feb 2013 18:07:12 +0000

Dear all, 

I have an issue with conducting a 1:1-merge in Stata. The merge is based on two variables. The 1. variable (string) consists of exactly 6 digits. The 2. variable  (integer) consists of exactly 4 digits (no variation in the length of digits in either of the two variables). I tried two versions, and the they both yielded different results. Please, further note that I use the same file to merge and the variables differ

Option 1: Defining a 10-digit string variable. Therefore, convert "var 2" to string and then sum var1 and var2. Hence, I obtain" var3" (which is a 10-digit string; again no variation w.r.t to the length of 10-digits) and merge (1:1) on "var3". → Results in 15,839 matches 
Option 2:  Merge (1:1) on var1 and var2  as separate variables → Results in 14,227

Does anybody know where this difference comes from. My gut feeling tells me that Option 2 is the more reliable one. However, I lack evidence on that. The abbreviated Do-File is attached. 

Thank you very much for your support!


******* Do File **************

use F:\001_Forschung\Daten\Cash&Acquisitions\file_A_prelim.dta, clear

 * Option 1
	gen acquirorcusip_year=cusip_6dgt+next_year //corresponds to var 3 in the above description 
    	gen length_cusip_6dgt=length(cusip_6dgt)
	gen length_announcement_year=length(next_year)
	gen length_acquirorcusip_year=length(acquirorcusip_year)
  	sum length_cusip_6dgt length_announcement_year length_acquirorcusip_year

	Variable        Obs        Mean    Std. Dev.       Min        Max
	length_cus~t     196217           6           0          6          6
	length_ann~r     196217           4           0          4          4
	length_acq~r     196217          10           0         10         10

* Option 2      
	 gen announcement_year=next_year // corresponds to var 2 in the above description. Rename due to file_B
		destring announcement_year, replace
	 gen acquirorcusip=cusip_6dgt  // corresponds to var 1 in the above description.
	sort acquirorcusip announcement_year

save file_A.dta, replace 
* Option 1: Merge on the joint string variable
	use file_A.dta, clear
	merge 1:1 acquirorcusip_year using F:\001_Forschung\Daten\Cash&Acquisitions\file_B.dta  

	Result                           # of obs.
	not matched                       191,640
	from master                   180,378  (_merge==1)
	from using                     11,262  (_merge==2)
	matched                            15,839  (_merge==3)

* Option 2:  Merge on two separate variables
	use file_A.dta, clear
	merge 1:1 acquirorcusip announcement_year using F:\001_Forschung\Daten\Cash&Acquisitions\file_B.dta  

	Result                           # of obs.
	not matched                       194,864
	from master                   181,990  (_merge==1)
	from using                     12,874  (_merge==2)
	matched                            14,227  (_merge==3)

*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index