Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: match variable across two tables


From   Joe Canner <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: RE: match variable across two tables
Date   Wed, 18 Dec 2013 21:32:24 +0000

Rochelle,

There are two user-written commands available from SSC that might be of interest: -reclink- and -vmatch-.  Both do some sort of "fuzzy" matching.  I haven't used them, though, so I can't help you much beyond suggesting that you look into them.

Regards,
Joe Canner
Johns Hopkins University School of Medicine

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Rongrong Zhang
Sent: Wednesday, December 18, 2013 2:15 PM
To: [email protected]
Subject: st: match variable across two tables

Dear STATALISTER,

I have two tables:

Table 1 has 3 variables  I-O number, I-O Name ,   Related 1997 NAICS codes.

Table 2 has 1 variable 1997 NAICS codes.

I want to link these two tables based on NAICS code. However, the level of details on NAICS code does not match one-to-one because the tables come from different data source. My goal is to know which NAICS code correspond to which I-O number. I can’t use Table 1 only, because TABLE 2 is produced from Wharton Research Database which has company level financial data – I will use later on.

By different details I mean : e.g.

table 1:

I-O number  I-O Name                  1997 NAICS codes

1110        Crop production

1111A0    Oilseed farming
         11111      11112

1111B0    Grain farming             11113      11114      11115
11116      11119

111200    Vegetable and melon farming
                   1112
111400 Greenhouse and nursery production
                1114
111910 Tobacco farming
                11191
111920 Cotton farming
                11192
1119A0 Sugarcane and sugar beet
      11193 111991
1119B0 All other crop farming
11194 111992 111998

in the above example, I present industry 1110 and its subindustries 1111A0, 1111B0, 111200, each of the subindustries correspond to a few (or a single) NAICS code (north america industry classification system).

table 2:
NAICS CODE
111
1111
111150
111199
111219
111310
111320
111332
111334
111335
111339
1114
111411
111419
111421
111422
111920
111930
111940
111998

if I enforce exact match, then table 2  111150 matches with table 1
1111B0,    table 2 1112l9 may be matched with  111200 table 1 I-O.

My question :
1.could you give a sample code/function to do exact match? note, if first 5digit match, and drop last 0 (naics), we consider that a match 2. if I want to increase match, how could I change the program to do 4 digit match

thanks a bunch,

--
Best,
Rochelle

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index