Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Χρυσούλα Γιαννικοπούλου <chrygiann@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: problem with commands |
Date | Tue, 7 Sep 2010 12:49:02 +0200 |
Dear Statalisters, I have a data set consisting of a number of companies which are all subsidiaries of the same company X. The following variables describe each of these companies and their mother company: 1) "id_mother", the identity of the mother company 2) "id_sub", the identity of the subsidiary company 3) "year_inc_sub", year of incorporation of the subsidiary 4) "SIC_mother", a 3-digit industry classification code for the mother 5) "SIC_sub", a 3-digit industry classification code for the subsidiary The dataset looks like this: id_mother sic_mother id_sub sic_sub year_inc_sub X 731 JP0JPN 738 1982 X 731 JP680JPN 737 2002 X 731 JP0JPN 899 2002 X 731 JP0JPN 355 1953 X 731 JPJPN 355 2001 X 731 JPN 502 1972 X 731 JPJPN 357 1960 Based on this information I want to create a new dataset with a time-series structure, which will refer to the mother company "X" for the period 1990-2005. In the new dataset I want to have the following variables: 1) "id_mother", as above; 2) "year", the year of the observation; 3) "diversification", a new variable whose estimation I explain below. The diversification variable should be created under the following logic: For observation in year 1990 the value of diversification should be the sum of a, b, and c below: a) take the total number of subsidiary companies which were incorporated in 1990 or earlier whose 1st digit of sic_sub is different than the 1st digit of sic_mother and multiply them by 3; divide this number by the total number of companies which where incorporated in 1990 or earlier (in the example, the total number of companies existing in 1990 is 4) b) take the total number of subsidiary companies which were incorporated in 1990 or earlier for which only the 1st digit of their sic_sub is the same as the 1st digit of sic_mother and multiply them by 2; divide this number by the total number of companies which where incorporated in 1990 or earlier c) take the total number of subsidiary companies which were incorporated in 1990 or earlier, whose 1st and 2nd digits of sic_sub are the same as the 1st and 2nd digits of sic_mother but they differ in the 3rd digit, and multiply them by 1, divide this number by the total number of companies which where incorporated in 1990 or earlier The same logic should apply for the observations for years 1991-2005 The expected dataset should look like this: id_mother year diversification X 1990 a+b+c when subsidiaries incorporated earlier than or equal to 1990 X 1991 a+b+c when subsidiaries incorporated earlier than or equal to 1991 X 1992 a+b+c when subsidiaries incorporated earlier than or equal to 1992 X ............................................................................................................ X 2005 a+b+c when subsidiaries incorporated earlier than or equal to 2005 - Hide quoted text - Thank you in advance. Chrysa * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/