Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Preparing the data for merge - No unique identifier and problems with regexs


From   Andreanne Tremblay Simard <atremblay10@schulich.yorku.ca>
To   statalist <statalist@hsphsun2.harvard.edu>
Subject   st: Preparing the data for merge - No unique identifier and problems with regexs
Date   Wed, 18 May 2011 15:34:38 -0400

Dear Stata Users,

I have one dataset describing companies, with the firm name (fname) as an identifier. 
I have another dataset describing mutual funds owned by the firms from the first dataset, with the variable fund_name as an identifier. 
I eventually want to merge the two datasets (one-to-many). 

However, the fund names (fund_name) and firms' names (fname) are not the same, although they generally have a common part. 
For example, here are two firms' names (fname) and funds operated by these firms (fund_name)
fname
Alpha Capital
3g Top Capital

fund_name
Alpha Growth
3g Top Capital Fund I
3g Income

How can I match these observations, so that Alpha Capital goes with Alpha Growth, and 3g Top Capital goes with both 3g Top Capital Fund I and 3g Income? That is, I want to get the following:
fname           fund_name
Alpha Capital   Alpha Growth
3g Top Capital  3g Top Capital Fund I
3g Top Capital  3g income

My original idea was to use regexs to extract the first part of the firms' and funds' names, and then merge using the extracted first part. However, I don't seem to be doing this right, since regexs seems to truncate the names from the end... but since the observations have a varying number of substrings (separated by spaces), I can't use the, say, second-to-last string, since sometimes there is no second-to-last because there is only one substring!

Thank you for your input, and for helping me learn more about Stata
Best regards, 

Andréanne Tremblay

 Disclaimer: This email and any files transmitted with it are private and confidential
and intended solely for the use of the individual or entity to whom they are addressed.
If you are not the addressee, you are not authorized to copy or use the information or
to place any reliance upon it, nor should you copy it or show it to anyone.
If you have received this email in error, please notify postmaster@schulich.yorku.ca

Schulich School of Business, York University




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index