Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Different results for 1:1-merging using the same variables (int & string) |

Date |
Thu, 14 Feb 2013 10:09:17 +0000 |

A side detail but gen newstrvar = strvar + string(numvar) is a simpler way to do that in this situation where -numvar- contains 4-digit integers. Calling up -tostring- and creating a new variable can be avoided. More to the point, did you look closely at some or all of the observations where you get a different result? Nick On Thu, Feb 14, 2013 at 8:17 AM, Hofbaur, Ulrich <Ulrich.Hofbaur@whu.edu> wrote: > Jeff, thanks for your suggestions! "next_year" is an integer converted to string by using the tostring-command. So, I simply add two strings. I created the variables in both files and exactly the same way. Just validated that. Jeph Herrin > I do not understand how you can calculate length(next_year) if "next_year" is an integer. > > Do you create the variables in both files, or just the -master- file? If just the -master- check that the -using- file variables have been constructed in the same way. On 2/13/2013 1:07 PM, Hofbaur, Ulrich wrote >> I have an issue with conducting a 1:1-merge in Stata. The merge is > based on two variables. The 1. variable (string) consists of exactly 6 digits. The 2. variable (integer) consists of exactly 4 digits (no variation in the length of digits in either of the two variables). I tried two versions, and the they both yielded different results. Please, further note that I use the same file to merge and the variables differ >> >> Option 1: Defining a 10-digit string variable. Therefore, convert "var > 2" to string and then sum var1 and var2. Hence, I obtain" var3" (which is a 10-digit string; again no variation w.r.t to the length of > 10-digits) and merge (1:1) on "var3". → Results in 15,839 matches >> Option 2: Merge (1:1) on var1 and var2 as separate variables → > Results in 14,227 >> >> Does anybody know where this difference comes from. My gut feeling > tells me that Option 2 is the more reliable one. However, I lack evidence on that. The abbreviated Do-File is attached. >> >> Thank you very much for your support! >> >> Best, >> Ulrich >> >> ******* Do File ************** >> >> use F:\001_Forschung\Daten\Cash&Acquisitions\file_A_prelim.dta, clear >> >> * Option 1 >> gen acquirorcusip_year=cusip_6dgt+next_year //corresponds to var 3 in > the above description >> gen length_cusip_6dgt=length(cusip_6dgt) >> gen length_announcement_year=length(next_year) >> gen length_acquirorcusip_year=length(acquirorcusip_year) >> sum length_cusip_6dgt length_announcement_year > length_acquirorcusip_year >> >> Variable Obs Mean Std. Dev. Min Max >> length_cus~t 196217 6 0 6 6 >> length_ann~r 196217 4 0 4 4 >> length_acq~r 196217 10 0 10 10 >> >> * Option 2 >> gen announcement_year=next_year // corresponds to var 2 in the above > description. Rename due to file_B >> destring announcement_year, replace >> gen acquirorcusip=cusip_6dgt // corresponds to var 1 in the above > description. >> sort acquirorcusip announcement_year >> >> save file_A.dta, replace >> >> >> * Option 1: Merge on the joint string variable >> use file_A.dta, clear >> merge 1:1 acquirorcusip_year using > F:\001_Forschung\Daten\Cash&Acquisitions\file_B.dta >> >> Result # of obs. >> ----------------------------------------- >> not matched 191,640 >> from master 180,378 (_merge==1) >> from using 11,262 (_merge==2) >> >> matched 15,839 (_merge==3) >> ----------------------------------------- >> >> * Option 2: Merge on two separate variables >> use file_A.dta, clear >> merge 1:1 acquirorcusip announcement_year using > F:\001_Forschung\Daten\Cash&Acquisitions\file_B.dta >> >> Result # of obs. >> ----------------------------------------- >> not matched 194,864 >> from master 181,990 (_merge==1) >> from using 12,874 (_merge==2) >> >> matched 14,227 (_merge==3) >> ----------------------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Different results for 1:1-merging using the same variables (int & string)***From:*"Hofbaur, Ulrich" <Ulrich.Hofbaur@whu.edu>

**Re: st: Different results for 1:1-merging using the same variables (int & string)***From:*Jeph Herrin <stata@spandrel.net>

**AW: st: Different results for 1:1-merging using the same variables (int & string)***From:*"Hofbaur, Ulrich" <Ulrich.Hofbaur@whu.edu>

- Prev by Date:
**Re: st: Macro parsing question.** - Next by Date:
**R: st: baseline adjustment in linear mixed models** - Previous by thread:
**AW: st: Different results for 1:1-merging using the same variables (int & string)** - Next by thread:
**st: -suest-** - Index(es):