Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: creating a numeric matrix from string variables


From   wgould@stata.com (William Gould, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: creating a numeric matrix from string variables
Date   Thu, 04 Jun 2009 09:09:42 -0500

Joe J. <joe.stata@gmail.com> needed to create a numeric, symmetric matrix 
recording the number of agreements between firms and, when last we left off, 
we had created a matrix M in Mata containing the desired result, 

        : M
        [symmetric]
               1   2   3   4   5
            +---------------------+
          1 |  0                  |
          2 |  0   0              |
          3 |  2   1   0          |
          4 |  1   0   1   0      |
          5 |  0   1   1   0   0  |
            +---------------------+

and copied the matrix back to the Stata data in memory, so we had, 

        . list

             +------------------------+
             | f1   f2   f3   f4   f5 |
             |------------------------|
          1. |  0    0    2    1    0 |
          2. |  0    0    1    0    1 |
          3. |  2    1    0    1    1 |
          4. |  1    0    1    0    0 |
          5. |  0    1    1    0    0 |
             +------------------------+

The code we used to copy M to Stata was crude, but it worked.

Joe says 

> [...] I need to:
> 
> 1. Generate a string variable, let's call it Company, that contains the
>    company code, as in the file 'mapping' in your code (you named this
>    variable Partner) . This variable will help me merge other firm-level
>    variables with the matrix, and
> 
> 2. I would like the company codes to be the column headings (that's as the
>    variable names) as well of the matrix, replacing f1, f2, etc in your
>    code.  In a sense these variables are the transpose of the string
>    variable Company.


If I understand Joe, he wants this, 

             +-------------------------------------------------+
             | company   f_11A   f_11K   f_12Z   f_14T   f_21S |
             |-------------------------------------------------|
          1. |     11A       0       0       2       1       0 |
          2. |     11K       0       0       1       0       1 |
          3. |     12Z       2       1       0       1       1 |
          4. |     14T       1       0       1       0       0 |
          5. |     21S       0       1       1       0       0 |
             +-------------------------------------------------+

I wonder whether Joe really wants that because I suspect that is not a 
useful way to store or analyze these data, but I may be wrong, so 
first, let's get the data organized in this way.


Obtaining what Joe wants
------------------------

Our example above is 5x5, but in Joe's real data, the result is 
3,338 x 3,338.  Joe got the program to work, however.

So let's go back to the point where we typed, 

        . mata: M = agmat(5, "p1 p2 p3")

and then went about moving M to Stata, and do the move to Stata in a 
more sophisticated way.

To remind you, mapping.dta looked like this:

        . use mapping, clear 

        . list

             +----------------+
             | partner   code |
             |----------------|
          1. |     11A      1 |
          2. |     11K      2 |
          3. |     12Z      3 |
          4. |     14T      4 |
          5. |     21S      5 |
             +----------------+

If I had partner from this dataset stored in a Mata colvector called partner,
then partner[i] would the identity of the i-th firm.  So here is how we are
going to start:


        . mata: M = agmat(5, "p1 p2 p3")
        . use mapping

        . mata: 

        : partner = st_sdata(., "partner")

        : M
        [symmetric]
               1   2   3   4   5
            +---------------------+
          1 |  0                  |
          2 |  0   0              |
          3 |  2   1   0          |
          4 |  1   0   1   0      |
          5 |  0   1   1   0   0  |
            +---------------------+

        : partner
                 1
            +-------+
          1 |  11A  |
          2 |  11K  |
          3 |  12Z  |
          4 |  14T  |
          5 |  21S  |
            +-------+

The solution is, 

        : st_dropvar(.)           // equivalent to -drop all- in Stata

        : st_addobs(cols(M))     

        : companyidx = st_addvar("str8", "company")

        : varidx     = st_addvar("int", "f_" :+ partner')

        : st_sstore(., companyidx, partner)

        : st_view(V=., ., varidx)

        : V[.,.] = M

        : end

The result is, 

        . list

             +-------------------------------------------------+
             | company   f_11A   f_11K   f_12Z   f_14T   f_21S |
             |-------------------------------------------------|
          1. |     11A       0       0       2       1       0 |
          2. |     11K       0       0       1       0       1 |
          3. |     12Z       2       1       0       1       1 |
          4. |     14T       1       0       1       0       0 |
          5. |     21S       0       1       1       0       0 |
             +-------------------------------------------------+


How I would organize these data
-------------------------------

I would organize these data in the long form, like this, 

             +-----------------------------------+
             | c1   c2   company1   company2   n |
             |-----------------------------------|
          1. |  1    2        11A        11K   0 |
          2. |  1    3        11A        12Z   2 |
          3. |  1    4        11A        14T   1 |
          4. |  1    5        11A        21S   0 |
          5. |  2    3        11K        12Z   1 |
             |-----------------------------------|
          6. |  2    4        11K        14T   0 |
          7. |  2    5        11K        21S   1 |
          8. |  3    4        12Z        14T   1 |
          9. |  3    5        12Z        21S   1 |
         10. |  4    5        14T        21S   0 |
             +-----------------------------------+

In this organization, the 0 agreements between 1 & 2 (11A & 11K) is recorded
only once; the 0 agreements between 2 & 1 (11K & 11A) is not recorded.  The
rule is that if you want to know the agreement between i and j, order i and j
so that i<j, and look for the obs. coresponding to c1==i & c2==j.  The same
rule applies for companies recorded in string form.  In this organization, it
will be easy to merge in variables containing company characteristics.

Anyway, the code to do that is, 

        : st_dropvar(.)           // equivalent to -drop all- in Stata
        : st_addvar("int", ("c1", "c2"))
        : st_addvar("str8", ("company1", "company2"))
        : st_addvar("int", "n")
        : N = rows(M)
        : st_addobs(N*(N+1)/2 - N)

        : st_view(n=., ., "n")
        : st_view(c1=., ., "c1")
        : st_view(c2=., ., "c2")
        : st_sview(company1=., ., "company1")
        : st_sview(company2=., ., "company2")

        : for (i=obs=1; i<=rows(M); i++) {
        :      for (j=i+1; j<=cols(M); j++) {
        :           c1[obs]       = i
        :           c2[obs]       = j
        :           company1[obs] = partner[i]
        :           company2[obs] = partner[j]
        :           n[obs]        = M[i,j]
        :           obs++
        :       }
        : }
        : end

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index