Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# st: problem with commands

 From Χρυσούλα Γιαννικοπούλου To statalist@hsphsun2.harvard.edu Subject st: problem with commands Date Tue, 7 Sep 2010 12:49:02 +0200

```Dear Statalisters,

I have a data set consisting of a number of companies which are all
subsidiaries of the same company X. The following variables describe
each of these companies and their mother company:
1) "id_mother", the identity of the mother company
2) "id_sub", the identity of the subsidiary company
3) "year_inc_sub", year of incorporation of the subsidiary
4) "SIC_mother", a 3-digit industry classification code for the mother
5) "SIC_sub", a 3-digit industry classification code for the subsidiary

The dataset looks like this:

id_mother     sic_mother     id_sub          sic_sub           year_inc_sub
X                  731               JP0JPN         738                  1982

X                  731               JP680JPN      737                   2002

X                  731               JP0JPN         899                     2002

X                  731               JP0JPN         355                    1953

X                  731              JPJPN            355
2001

X                  731              JPN                502
1972

X                  731             JPJPN              357
1960

Based on this information I want to create a new dataset with a
time-series structure, which will refer to the mother company "X" for
the period 1990-2005. In the new dataset I want to have the following
variables:
1) "id_mother", as above;
2) "year", the year of the observation;
3) "diversification", a new variable whose estimation I explain below.

The diversification variable should be created under the following logic:
For observation in year 1990 the value of diversification should be
the sum of a, b, and c below:

a) take the total number of subsidiary companies which were
incorporated in 1990 or earlier whose 1st digit of sic_sub is
different than the 1st digit of sic_mother and multiply them by 3;
divide this number by the total number of companies which where
incorporated in 1990 or earlier (in the example, the total number of
companies existing in 1990 is 4)
b) take the total number of subsidiary companies which were
incorporated in 1990 or earlier for which only the 1st digit of their
sic_sub is the same as the 1st digit of sic_mother and multiply them
by 2; divide this number by the total number of companies which where
incorporated in 1990 or earlier
c) take the total number of subsidiary companies which were
incorporated in 1990 or earlier, whose 1st and 2nd digits of sic_sub
are the same as the 1st and 2nd digits of sic_mother but they differ
in the 3rd digit, and multiply them by 1, divide this number by the
total number of companies which where incorporated in 1990 or earlier

The same logic should apply for the observations for years 1991-2005

The expected dataset should look like this:

id_mother    year      diversification
X                1990      a+b+c when subsidiaries incorporated
earlier than or equal to 1990
X                1991      a+b+c when subsidiaries incorporated
earlier than or equal to 1991
X                1992     a+b+c when subsidiaries incorporated earlier
than or equal to 1992
X
............................................................................................................
X                2005     a+b+c when subsidiaries incorporated earlier
than or equal to 2005
- Hide quoted text -