Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: matching databases


From   kokootchke <kokootchke@hotmail.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: matching databases
Date   Mon, 25 Aug 2008 12:55:48 -0400

Hello!

I have two manufacturing databases that I need to put together. The problem is that each database is classified under a different coding system. I do have the codes to match the observations accordingly but I am not sure of what's the best to do the matching.

Database A contains variables such as total # of employees by industrial sector (v1), total value of shipments by industrial sector (v2), and annual growth rates of the industrial sector (v3). These industrial sectors are according to the SIC87 industry classification, so the database would look like this:

sic87     yr    v1       v2        v3
2011     93   124.4    53.3    .0043177 
2011     94   119.5    50.7   -.0043294 
2011     95   125.8    51.4   -.0102257 
2011     96     130    51.6   -.0452671 
2013     93    48.7     2.1           . 
2013     94    49.6     2.4     .047534 
2013     95    48.5       2    .0065023
2014     95    9.6     1.6     .068254 
2015     95    8.2      5.3    .0935813

I need to translate all of these database into the ISIC3 industry classification. The problem is that one SIC87 category can go into several ISIC3 categories and also several SIC87 categories can go into only one ISIC3 category.

For instance, suppose that my correspondences are as follows:

sic87   isic3
2011   2020
2011   2022
2011   2026
2013   2100
2014   2100
2015   2100

This means that sic87 category 2011 is now considered 3 separate categories (2020, 2022, and 2026), while all three categories 2013, 2014, and 2015 are now considered only one category 2100.

I want to do the matching in two separate ways:

(a) The first way deals with variables that one can easily add by sector, like the total # of employees by sector (v1) or the value of shipments by sector (v2). In this case, if multiple SIC87 categories are now classified as just one ISIC3 category, we can just add the numbers across categories; if just one SIC87 category is now classified as several ISIC3 categories, we can split the SIC87 number by the number of new ISIC3 categories.

(b) The second one deals with variables that are not possible to just add because the sum would be meaningless. For example, for the case of v3, when multiple SIC87 categories have different growth rates and these categories translate into only one ISIC3 category, we can take the average by sector. On the other hand, if 

So, if we look at SIC87 category 2011 for year 95, I want my code to do the following calculations:

isic3  yr   v1            v2           v3
2020  95  =125.8/3  =51.4/3   =-.0102257
2022  95  =125.8/3  =51.4/3   =-.0102257
2026  95  =125.8/3  =51.4/3   =-.0102257


while SIC87 categories 2013, 2014, and 2015 for the same year would all fuse into one ISIC3 category to look like this:

isic3  yr   v1                     v2                 v3
2100  95  =48.5+9.6+8.2  =2+1.6+5.3   =(.0065023+.068254+.0935813)/3

Any ideas on how to achieve this?

Thank you.
Adrian


 

_________________________________________________________________
Talk to your Yahoo! Friends via Windows Live Messenger.  Find out how.
http://www.windowslive.com/explore/messenger?ocid=TXT_TAGLM_WL_messenger_yahoo_082008
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index