Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: match variable across two tables


From   Rongrong Zhang <r05zhang@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: match variable across two tables
Date   Wed, 18 Dec 2013 21:42:46 -0500

Thanks Joe !

On Wed, Dec 18, 2013 at 4:32 PM, Joe Canner <jcanner1@jhmi.edu> wrote:
> Rochelle,
>
> There are two user-written commands available from SSC that might be of interest: -reclink- and -vmatch-.  Both do some sort of "fuzzy" matching.  I haven't used them, though, so I can't help you much beyond suggesting that you look into them.
>
> Regards,
> Joe Canner
> Johns Hopkins University School of Medicine
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Rongrong Zhang
> Sent: Wednesday, December 18, 2013 2:15 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: st: match variable across two tables
>
> Dear STATALISTER,
>
> I have two tables:
>
> Table 1 has 3 variables  I-O number, I-O Name ,   Related 1997 NAICS codes.
>
> Table 2 has 1 variable 1997 NAICS codes.
>
> I want to link these two tables based on NAICS code. However, the level of details on NAICS code does not match one-to-one because the tables come from different data source. My goal is to know which NAICS code correspond to which I-O number. I can’t use Table 1 only, because TABLE 2 is produced from Wharton Research Database which has company level financial data – I will use later on.
>
> By different details I mean : e.g.
>
> table 1:
>
> I-O number  I-O Name                  1997 NAICS codes
>
> 1110        Crop production
>
> 1111A0    Oilseed farming
>          11111      11112
>
> 1111B0    Grain farming             11113      11114      11115
> 11116      11119
>
> 111200    Vegetable and melon farming
>                    1112
> 111400 Greenhouse and nursery production
>                 1114
> 111910 Tobacco farming
>                 11191
> 111920 Cotton farming
>                 11192
> 1119A0 Sugarcane and sugar beet
>       11193 111991
> 1119B0 All other crop farming
> 11194 111992 111998
>
> in the above example, I present industry 1110 and its subindustries 1111A0, 1111B0, 111200, each of the subindustries correspond to a few (or a single) NAICS code (north america industry classification system).
>
> table 2:
> NAICS CODE
> 111
> 1111
> 111150
> 111199
> 111219
> 111310
> 111320
> 111332
> 111334
> 111335
> 111339
> 1114
> 111411
> 111419
> 111421
> 111422
> 111920
> 111930
> 111940
> 111998
>
> if I enforce exact match, then table 2  111150 matches with table 1
> 1111B0,    table 2 1112l9 may be matched with  111200 table 1 I-O.
>
> My question :
> 1.could you give a sample code/function to do exact match? note, if first 5digit match, and drop last 0 (naics), we consider that a match 2. if I want to increase match, how could I change the program to do 4 digit match
>
> thanks a bunch,
>
> --
> Best,
> Rochelle
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/



-- 
-Best,
R

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index