Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: creating a numeric matrix from string variables


From   wgould@stata.com (William Gould, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: creating a numeric matrix from string variables
Date   Wed, 03 Jun 2009 16:59:23 -0500

Joe J <joe.stata@gmail.com> asked, 

> What I am trying to do is create a numeric matrix from, let's say, 3
> string variables.  My data set provides [...]
> [I] was wondering if Mata could help solve this ?

Yes.  Mata is a great way to solve this problem.

Joe has data like, 

        partner1   partner2       partner3
             11A        ---            12Z
             12Z        21S            11K
             14T        11A            12Z

and from this, he wants to create data like, 

               11A   12Z  21S  11K   14T
        11A      0     2    0    0     1
        12Z      2     0    1    1     1
        21S      0     1    0    1     0
        11K      0     1    1    0     0
        14T      1     1    0    0     0

The numbers in the matrix record the numbewr of pairs in the original data.
Observations in the original data record agreements among companies.  
The companies are coded 11A, 12Z, etc.  He wants a matrix recording the 
number of agreements between companies.

I could do this entire problem using only Mata, but that would just be more 
work that is necessary.  Mata really shows its power when used with Stata, 
and vice-versa.  So I tooks Joe's original data and formed form it:

        . list

             +-----------------------------------------------+
             | partner1   partner2   partner3   p1   p2   p3 |
             |-----------------------------------------------|
          1. |      12Z        21S        11K    3    5    2 |
          2. |      14T        11A        12Z    4    1    3 |
          3. |      11A        ---        12Z    1    .    3 |
             +-----------------------------------------------+

New variables are p1, p2, and p3 are just like partner 1, partner2, 
and partner3, except that I have assigned numeric codes to the companies.
1 is cokmpany 11A, 2 is 11K, and so on.  Those numbers will become my 
row and column numbers in the Mata program I will write.  Before 
getting into Mata, however, let me show you the Stata code that took the
original data and added p1, p2, and p3:

        ------------------------------------------------------------------
        input str3 (partner1 partner2 partner3)
        11A --- 12Z
        12Z 21S 11K
        14T 11A 12Z
        end
        save long1
        list

        gen id = _n
        reshape long partner, i(id)
        drop id _j
        drop if partner=="---"
        sort partner
        by partner: keep if _n==1
        gen code = _n
        sum code
        save mapping


        program fixvar
                args oldvar newvar
                rename `oldvar' partner
                sort partner 
                merge partner using mapping
                keep if _merge==1 | _merge==3
                drop _merge
                rename partner `oldvar'
                rename code `newvar'
        end

        use long1, clear
        fixvar partner1 p1
        fixvar partner2 p2
        fixvar partner3 p3
        list
        ------------------------------------------------------------------

I could write about the code, but I think it is self explanatory for anyone 
who wants to spend the time reading it.

With that dataset, here's the Mata code to produce the desired matrix:


        ------------------------------------------------------------------
        mata:
        real matrix agmat(real scalar N, string scalar varnames)
        {
                st_view(V, ., tokens(varnames))

                A = J(N, N, 0)

                for (j=1; j<=rows(V); j++) {
                        for (i1=1; i1<=cols(V); i1++) {
                                for (i2=1; i2<=cols(V); i2++) {
                                        if (i1!=i2) {
                                                k1 = V[j, i1]
                                                k2 = V[j, i2]
                                                if (k1!=. & k2!=.) {
                                                        A[k1,k2] = A[k1,k2] + 1
                                                }
                                        }
                                }
                        }
                }
                return(A)
        }
        end
        ------------------------------------------------------------------

To produce the desired result, I then typed

        . mata:

        : agmat(5, "p1 p2 p3")
        [symmetric]
               1   2   3   4   5
            +---------------------+
          1 |  0                  |
          2 |  0   0              |
          3 |  2   1   0          |
          4 |  1   0   1   0      |
          5 |  0   1   1   0   0  |
            +---------------------+

       : end

The code was reasonably straight forward.  We started with the data, 

             +-----------------------------------------------+
             | partner1   partner2   partner3   p1   p2   p3 |
             |-----------------------------------------------|
          1. |      12Z        21S        11K    3    5    2 |
          2. |      14T        11A        12Z    4    1    3 |
          3. |      11A        ---        12Z    1    .    3 |
             +-----------------------------------------------+

Just look at the columns for p1, p2, and p3.  We want to start with a 
5x5 matrix A = 0, and then add 1 to the elements (starting at observation 
1) (3,5), (3,2), (5,3), (5,2), (2,3), (2,5), and then we move onto observation
2, and so on.  

That's what the code does; going across observations, it takes every 
combination of pairs of p1, p2, and p3 and adds 1 to a 5x5 matrix that 
started out containing 0.

That matrix is now in Mata.  We could save it in a Mata variable by 
typing, 

        : M = agmat(5, "p1 p2 p3")

More likely, however, I'm guessing Joe will want to form a Stata dataset from 
the result.  There are lots of ways Joe could do that, and nearly all of them 
are more clever than what I'm about to show you, but what follows is the 
easiest to understand:

        . mata: M = agmat(5, "p1 p2 p3")
        . drop _all
        . set obs 5 
        . gen f1 = 0
        . gen f2 = 0
        . gen f3 = 0
        . gen f4 = 0
        . gen f5 = 0
        . mata: 
        : st_view(V=., ., .)
        : V[.,.] = M
        : end

The result of which is, 

        . list

             +------------------------+
             | f1   f2   f3   f4   f5 |
             |------------------------|
          1. |  0    0    2    1    0 |
          2. |  0    0    1    0    1 |
          3. |  2    1    0    1    1 |
          4. |  1    0    1    0    0 |
          5. |  0    1    1    0    0 |
             +------------------------+

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index