[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: matching databases (concordance tables)

From	kokootchke <[email protected]>
To	<[email protected]>
Subject	RE: st: matching databases (concordance tables)
Date	Tue, 26 Aug 2008 15:01:31 -0400

In case anyone is interested in using concordance tables, this is the approach I followed. For the first set of variables (the additive ones), I basically compute the sum by ISIC sector. For the second set of variables, the ones that cannot be simply added (because they are growth rates, etc.), I basically compute a weighted average by value added. This may be very specific for my own purposes but I think it can easily be suited to other people's preferences.

/* Original database has sic87 industry codes. Using the correspondence tables between sic87 and isic3 industry codes, I create one long correspondence table where each row is a sic87 category associated with one and only one isic3. In case one sic87 codes are associated with X isic3 categories, we would have X rows with the same sic87 code but different associated isic3. Once I have this long correspondence table, I merge the original sic87 database with the correspondence table by sic87, and then... */

/* Create variable that tells me how many times sic87 is repeated. This will be used to compute partial statistics. For example, for the "additive" variables like total # of employees, if one same sic87 is associated with 3 different isic3, the total number will be split between these 3 different isic3 categories. */

bys sic87: g N = _N

local varlist "emp pay prode prodh prodw vship matcost vadd invest invent energy cap equip plant"
foreach var of local varlist {
 g t_`var' = `var'/N
 bys isic3: egen tot`var' = sum(t_`var')
}

/* t_`var' is a temporary variable that tells me how many of the original number in the sic87 category are being assigned to that isic3 category. On the other hand, tot`var' adds all of these partial numbers by isic3 industry category. */

save temp, replace
collapse (mean) tot*, by(isic3)
renpfix tot
sort isic3
save temp1, replace

/* For the non-additive variables, like price deflators and growth rates, we compute the "average" across all of the original sic87 categories that compose the new isic3 category, weighted by value added. For instance, suppose that sic87 categories 1000, 1020 and 1040 are all assigned to isic3 category 1100, and suppose that each of these industry sectors grows at 10%, 5% and 7%, respectively. Then, the average growth rate of the isic3 category 1100 will be the average of those 3 growth rates, weighted by their value added. */

local varlist "piship pimat piinv pien dtfp5 tfp5 dtfp4 tfp4"
foreach var of local varlist {
 use temp, clear
 keep isic3 `var' t_vadd
 collapse (mean) `var' [w = t_vadd], by(isic3)
 sort isic3
 save temp_`var', replace
}

/* And finally, we merge all of the collapsed datasets into one big dataset. */

use temp1, clear
merge isic3 using temp_piship temp_pimat temp_piinv temp_pien temp_dtfp5 temp_tfp5 temp_dtfp4 temp_tfp4

> From: [email protected]
> To: [email protected]
> Subject: RE: st: matching databases
> Date: Mon, 25 Aug 2008 14:17:12 -0400
>
> Thank you, Rachel.
>
> Yes, I actually got the ISIC3 - SIC87 concordance from this website... but I don't see any code that does what I need to do. I am sure other authors have written some code to do what I want to do but I haven't found it on the web, so I am writing my own code but it involves lots of loops and it looks complicated... I was wondering if there's a more efficient way of doing it.
>
> Thank you.
> Adrian
>
>
>
>
>> Date: Mon, 25 Aug 2008 13:13:55 -0400
>> From: [email protected]
>> To: [email protected]
>> Subject: Re: st: matching databases
>>
>> google up "concordance files," for example
>>
>> http://www.macalester.edu/research/economics/page/haveman/Trade.Resources/tradeconcordances.html
>>
>>
>>
>>
>> On Mon, Aug 25, 2008 at 12:55 PM, kokootchke wrote:
>>> Hello!
>>>
>>> I have two manufacturing databases that I need to put together. The problem is that each database is classified under a different coding system. I do have the codes to match the observations accordingly but I am not sure of what's the best to do the matching.
>>>
>>> Database A contains variables such as total # of employees by industrial sector (v1), total value of shipments by industrial sector (v2), and annual growth rates of the industrial sector (v3). These industrial sectors are according to the SIC87 industry classification, so the database would look like this:
>>>
>>> sic87 yr v1 v2 v3
>>> 2011 93 124.4 53.3 .0043177
>>> 2011 94 119.5 50.7 -.0043294
>>> 2011 95 125.8 51.4 -.0102257
>>> 2011 96 130 51.6 -.0452671
>>> 2013 93 48.7 2.1 .
>>> 2013 94 49.6 2.4 .047534
>>> 2013 95 48.5 2 .0065023
>>> 2014 95 9.6 1.6 .068254
>>> 2015 95 8.2 5.3 .0935813
>>>
>>> I need to translate all of these database into the ISIC3 industry classification. The problem is that one SIC87 category can go into several ISIC3 categories and also several SIC87 categories can go into only one ISIC3 category.
>>>
>>> For instance, suppose that my correspondences are as follows:
>>>
>>> sic87 isic3
>>> 2011 2020
>>> 2011 2022
>>> 2011 2026
>>> 2013 2100
>>> 2014 2100
>>> 2015 2100
>>>
>>> This means that sic87 category 2011 is now considered 3 separate categories (2020, 2022, and 2026), while all three categories 2013, 2014, and 2015 are now considered only one category 2100.
>>>
>>> I want to do the matching in two separate ways:
>>>
>>> (a) The first way deals with variables that one can easily add by sector, like the total # of employees by sector (v1) or the value of shipments by sector (v2). In this case, if multiple SIC87 categories are now classified as just one ISIC3 category, we can just add the numbers across categories; if just one SIC87 category is now classified as several ISIC3 categories, we can split the SIC87 number by the number of new ISIC3 categories.
>>>
>>> (b) The second one deals with variables that are not possible to just add because the sum would be meaningless. For example, for the case of v3, when multiple SIC87 categories have different growth rates and these categories translate into only one ISIC3 category, we can take the average by sector. On the other hand, if
>>>
>>> So, if we look at SIC87 category 2011 for year 95, I want my code to do the following calculations:
>>>
>>> isic3 yr v1 v2 v3
>>> 2020 95 =125.8/3 =51.4/3 =-.0102257
>>> 2022 95 =125.8/3 =51.4/3 =-.0102257
>>> 2026 95 =125.8/3 =51.4/3 =-.0102257
>>>
>>>
>>> while SIC87 categories 2013, 2014, and 2015 for the same year would all fuse into one ISIC3 category to look like this:
>>>
>>> isic3 yr v1 v2 v3
>>> 2100 95 =48.5+9.6+8.2 =2+1.6+5.3 =(.0065023+.068254+.0935813)/3
>>>
>>> Any ideas on how to achieve this?
>>>
>>> Thank you.
>>> Adrian
>>>
>>>
>>>
>>>
>>> _________________________________________________________________
>>> Talk to your Yahoo! Friends via Windows Live Messenger. Find out how.
>>> http://www.windowslive.com/explore/messenger?ocid=TXT_TAGLM_WL_messenger_yahoo_082008
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
> _________________________________________________________________
> Get ideas on sharing photos from people like you. Find new ways to share.
> http://www.windowslive.com/explore/photogallery/posts?ocid=TXT_TAGLM_WL_Photo_Gallery_082008
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/

_________________________________________________________________
Get ideas on sharing photos from people like you.  Find new ways to share.
http://www.windowslive.com/explore/photogallery/posts?ocid=TXT_TAGLM_WL_Photo_Gallery_082008
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: age-adjusted means
  - From: "Mona Mowafi" <[email protected]>
- Re: st: age-adjusted means
  - From: Maarten buis <[email protected]>
- Re: st: age-adjusted means
  - From: "Mona Mowafi" <[email protected]>
- st: matching databases
  - From: kokootchke <[email protected]>
- Re: st: matching databases
  - From: "Rafal Raciborski" <[email protected]>
- RE: st: matching databases
  - From: kokootchke <[email protected]>

Prev by Date: st: Stata randomly drops variables
Next by Date: Re: st: Stata randomly drops variables
Previous by thread: RE: st: matching databases
Next by thread: st: Question for qll routine
Index(es):
- Date
- Thread