Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Hofbaur, Ulrich" <Ulrich.Hofbaur@whu.edu> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
AW: st: Different results for 1:1-merging using the same variables (int & string) |

Date |
Thu, 14 Feb 2013 08:17:03 +0000 |

Dear all, Jeff, thanks for your suggestions! "next_year" is an integer converted to string by using the tostring-command. So, I simply add two strings. I created the variables in both files and exactly the same way. Just validated that. Ulrich -----Ursprüngliche Nachricht----- Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Jeph Herrin Gesendet: Wednesday, February 13, 2013 9:28 PM An: statalist@hsphsun2.harvard.edu Betreff: Re: st: Different results for 1:1-merging using the same variables (int & string) I do not understand how you can calculate length(next_year) if "next_year" is an integer. Do you create the variables in both files, or just the -master- file? If just the -master- check that the -using- file variables have been constructed in the same way. J On 2/13/2013 1:07 PM, Hofbaur, Ulrich wrote:> Dear all, > > I have an issue with conducting a 1:1-merge in Stata. The merge is based on two variables. The 1. variable (string) consists of exactly 6 digits. The 2. variable (integer) consists of exactly 4 digits (no variation in the length of digits in either of the two variables). I tried two versions, and the they both yielded different results. Please, further note that I use the same file to merge and the variables differ > > Option 1: Defining a 10-digit string variable. Therefore, convert "var 2" to string and then sum var1 and var2. Hence, I obtain" var3" (which is a 10-digit string; again no variation w.r.t to the length of 10-digits) and merge (1:1) on "var3". → Results in 15,839 matches > Option 2: Merge (1:1) on var1 and var2 as separate variables → Results in 14,227 > > Does anybody know where this difference comes from. My gut feeling tells me that Option 2 is the more reliable one. However, I lack evidence on that. The abbreviated Do-File is attached. > > Thank you very much for your support! > > Best, > Ulrich > > ******* Do File ************** > > use F:\001_Forschung\Daten\Cash&Acquisitions\file_A_prelim.dta, clear > > * Option 1 > gen acquirorcusip_year=cusip_6dgt+next_year //corresponds to var 3 in the above description > gen length_cusip_6dgt=length(cusip_6dgt) > gen length_announcement_year=length(next_year) > gen length_acquirorcusip_year=length(acquirorcusip_year) > sum length_cusip_6dgt length_announcement_year length_acquirorcusip_year > > Variable Obs Mean Std. Dev. Min Max > length_cus~t 196217 6 0 6 6 > length_ann~r 196217 4 0 4 4 > length_acq~r 196217 10 0 10 10 > > * Option 2 > gen announcement_year=next_year // corresponds to var 2 in the above description. Rename due to file_B > destring announcement_year, replace > gen acquirorcusip=cusip_6dgt // corresponds to var 1 in the above description. > sort acquirorcusip announcement_year > > save file_A.dta, replace > > > * Option 1: Merge on the joint string variable > use file_A.dta, clear > merge 1:1 acquirorcusip_year using F:\001_Forschung\Daten\Cash&Acquisitions\file_B.dta > > Result # of obs. > ----------------------------------------- > not matched 191,640 > from master 180,378 (_merge==1) > from using 11,262 (_merge==2) > > matched 15,839 (_merge==3) > ----------------------------------------- > > * Option 2: Merge on two separate variables > use file_A.dta, clear > merge 1:1 acquirorcusip announcement_year using F:\001_Forschung\Daten\Cash&Acquisitions\file_B.dta > > Result # of obs. > ----------------------------------------- > not matched 194,864 > from master 181,990 (_merge==1) > from using 12,874 (_merge==2) > > matched 14,227 (_merge==3) > ----------------------------------------- > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ > > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Different results for 1:1-merging using the same variables (int & string)***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: Different results for 1:1-merging using the same variables (int & string)***From:*"Hofbaur, Ulrich" <Ulrich.Hofbaur@whu.edu>

**Re: st: Different results for 1:1-merging using the same variables (int & string)***From:*Jeph Herrin <stata@spandrel.net>

- Prev by Date:
**st: Loop for Heckman selection model with unbalanced panel** - Next by Date:
**Re: st: estat firststage after using vce(cluster id)** - Previous by thread:
**Re: st: Different results for 1:1-merging using the same variables (int & string)** - Next by thread:
**Re: st: Different results for 1:1-merging using the same variables (int & string)** - Index(es):