Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: st: Different results for 1:1-merging using the same variables (int & string)


From   "Hofbaur, Ulrich" <Ulrich.Hofbaur@whu.edu>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   AW: st: Different results for 1:1-merging using the same variables (int & string)
Date   Thu, 14 Feb 2013 08:17:03 +0000

Dear all,

Jeff, thanks for your suggestions! "next_year" is an integer converted to string by using the tostring-command. So, I simply add two strings. I created the variables in both files and exactly the same way. Just validated that. 

Ulrich

-----Ursprüngliche Nachricht-----
Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Jeph Herrin
Gesendet: Wednesday, February 13, 2013 9:28 PM
An: statalist@hsphsun2.harvard.edu
Betreff: Re: st: Different results for 1:1-merging using the same variables (int & string)

I do not understand how you can calculate length(next_year) if "next_year" is an integer.

Do you create the variables in both files, or just the -master- file? If just the -master- check that the -using- file variables have been constructed in the same way.

J


On 2/13/2013 1:07 PM, Hofbaur, Ulrich wrote:> Dear all,
>
> I have an issue with conducting a 1:1-merge in Stata. The merge is
based on two variables. The 1. variable (string) consists of exactly 6 digits. The 2. variable  (integer) consists of exactly 4 digits (no variation in the length of digits in either of the two variables). I tried two versions, and the they both yielded different results. Please, further note that I use the same file to merge and the variables differ
>
> Option 1: Defining a 10-digit string variable. Therefore, convert "var
2" to string and then sum var1 and var2. Hence, I obtain" var3" (which is a 10-digit string; again no variation w.r.t to the length of
10-digits) and merge (1:1) on "var3". → Results in 15,839 matches
> Option 2:  Merge (1:1) on var1 and var2  as separate variables →
Results in 14,227
>
> Does anybody know where this difference comes from. My gut feeling
tells me that Option 2 is the more reliable one. However, I lack evidence on that. The abbreviated Do-File is attached.
>
> Thank you very much for your support!
>
> Best,
> Ulrich
>
> ******* Do File **************
>
> use F:\001_Forschung\Daten\Cash&Acquisitions\file_A_prelim.dta, clear
>
>   * Option 1
> 	gen acquirorcusip_year=cusip_6dgt+next_year //corresponds to var 3 in
the above description
>      	gen length_cusip_6dgt=length(cusip_6dgt)
> 	gen length_announcement_year=length(next_year)
> 	gen length_acquirorcusip_year=length(acquirorcusip_year)
>    	sum length_cusip_6dgt length_announcement_year
length_acquirorcusip_year
>
> 	Variable        Obs        Mean    Std. Dev.       Min        Max
> 	length_cus~t     196217           6           0          6          6
> 	length_ann~r     196217           4           0          4          4
> 	length_acq~r     196217          10           0         10         10
>
> * Option 2
> 	 gen announcement_year=next_year // corresponds to var 2 in the above
description. Rename due to file_B
> 		destring announcement_year, replace
> 	 gen acquirorcusip=cusip_6dgt  // corresponds to var 1 in the above
description.
> 	sort acquirorcusip announcement_year
>
> save file_A.dta, replace
>
>
> * Option 1: Merge on the joint string variable
> 	use file_A.dta, clear
> 	merge 1:1 acquirorcusip_year using
F:\001_Forschung\Daten\Cash&Acquisitions\file_B.dta
>
> 	Result                           # of obs.
> 	-----------------------------------------
> 	not matched                       191,640
> 	from master                   180,378  (_merge==1)
> 	from using                     11,262  (_merge==2)
> 		
> 	matched                            15,839  (_merge==3)
> 	-----------------------------------------
>
> * Option 2:  Merge on two separate variables
> 	use file_A.dta, clear
> 	merge 1:1 acquirorcusip announcement_year using
F:\001_Forschung\Daten\Cash&Acquisitions\file_B.dta
>
> 	Result                           # of obs.
> 	-----------------------------------------
> 	not matched                       194,864
> 	from master                   181,990  (_merge==1)
> 	from using                     12,874  (_merge==2)
> 	
> 	matched                            14,227  (_merge==3)
> 	-----------------------------------------
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index