# Re: st: creating a numeric matrix from string variables

 From [email protected] (William Gould, StataCorp LP) To [email protected] Subject Re: st: creating a numeric matrix from string variables Date Wed, 03 Jun 2009 16:59:23 -0500

```Joe J <[email protected]> asked,

> What I am trying to do is create a numeric matrix from, let's say, 3
> string variables.  My data set provides [...]
> [I] was wondering if Mata could help solve this ?

Yes.  Mata is a great way to solve this problem.

Joe has data like,

partner1   partner2       partner3
11A        ---            12Z
12Z        21S            11K
14T        11A            12Z

and from this, he wants to create data like,

11A   12Z  21S  11K   14T
11A      0     2    0    0     1
12Z      2     0    1    1     1
21S      0     1    0    1     0
11K      0     1    1    0     0
14T      1     1    0    0     0

The numbers in the matrix record the numbewr of pairs in the original data.
Observations in the original data record agreements among companies.
The companies are coded 11A, 12Z, etc.  He wants a matrix recording the
number of agreements between companies.

I could do this entire problem using only Mata, but that would just be more
work that is necessary.  Mata really shows its power when used with Stata,
and vice-versa.  So I tooks Joe's original data and formed form it:

. list

+-----------------------------------------------+
| partner1   partner2   partner3   p1   p2   p3 |
|-----------------------------------------------|
1. |      12Z        21S        11K    3    5    2 |
2. |      14T        11A        12Z    4    1    3 |
3. |      11A        ---        12Z    1    .    3 |
+-----------------------------------------------+

New variables are p1, p2, and p3 are just like partner 1, partner2,
and partner3, except that I have assigned numeric codes to the companies.
1 is cokmpany 11A, 2 is 11K, and so on.  Those numbers will become my
row and column numbers in the Mata program I will write.  Before
getting into Mata, however, let me show you the Stata code that took the
original data and added p1, p2, and p3:

------------------------------------------------------------------
input str3 (partner1 partner2 partner3)
11A --- 12Z
12Z 21S 11K
14T 11A 12Z
end
save long1
list

gen id = _n
reshape long partner, i(id)
drop id _j
drop if partner=="---"
sort partner
by partner: keep if _n==1
gen code = _n
sum code
save mapping

program fixvar
args oldvar newvar
rename `oldvar' partner
sort partner
merge partner using mapping
keep if _merge==1 | _merge==3
drop _merge
rename partner `oldvar'
rename code `newvar'
end

use long1, clear
fixvar partner1 p1
fixvar partner2 p2
fixvar partner3 p3
list
------------------------------------------------------------------

I could write about the code, but I think it is self explanatory for anyone
who wants to spend the time reading it.

With that dataset, here's the Mata code to produce the desired matrix:

------------------------------------------------------------------
mata:
real matrix agmat(real scalar N, string scalar varnames)
{
st_view(V, ., tokens(varnames))

A = J(N, N, 0)

for (j=1; j<=rows(V); j++) {
for (i1=1; i1<=cols(V); i1++) {
for (i2=1; i2<=cols(V); i2++) {
if (i1!=i2) {
k1 = V[j, i1]
k2 = V[j, i2]
if (k1!=. & k2!=.) {
A[k1,k2] = A[k1,k2] + 1
}
}
}
}
}
return(A)
}
end
------------------------------------------------------------------

To produce the desired result, I then typed

. mata:

: agmat(5, "p1 p2 p3")
[symmetric]
1   2   3   4   5
+---------------------+
1 |  0                  |
2 |  0   0              |
3 |  2   1   0          |
4 |  1   0   1   0      |
5 |  0   1   1   0   0  |
+---------------------+

: end

The code was reasonably straight forward.  We started with the data,

+-----------------------------------------------+
| partner1   partner2   partner3   p1   p2   p3 |
|-----------------------------------------------|
1. |      12Z        21S        11K    3    5    2 |
2. |      14T        11A        12Z    4    1    3 |
3. |      11A        ---        12Z    1    .    3 |
+-----------------------------------------------+

Just look at the columns for p1, p2, and p3.  We want to start with a
5x5 matrix A = 0, and then add 1 to the elements (starting at observation
1) (3,5), (3,2), (5,3), (5,2), (2,3), (2,5), and then we move onto observation
2, and so on.

That's what the code does; going across observations, it takes every
combination of pairs of p1, p2, and p3 and adds 1 to a 5x5 matrix that
started out containing 0.

That matrix is now in Mata.  We could save it in a Mata variable by
typing,

: M = agmat(5, "p1 p2 p3")

More likely, however, I'm guessing Joe will want to form a Stata dataset from
the result.  There are lots of ways Joe could do that, and nearly all of them
are more clever than what I'm about to show you, but what follows is the
easiest to understand:

. mata: M = agmat(5, "p1 p2 p3")
. drop _all
. set obs 5
. gen f1 = 0
. gen f2 = 0
. gen f3 = 0
. gen f4 = 0
. gen f5 = 0
. mata:
: st_view(V=., ., .)
: V[.,.] = M
: end

The result of which is,

. list

+------------------------+
| f1   f2   f3   f4   f5 |
|------------------------|
1. |  0    0    2    1    0 |
2. |  0    0    1    0    1 |
3. |  2    1    0    1    1 |
4. |  1    0    1    0    0 |
5. |  0    1    1    0    0 |
+------------------------+

-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```

• Follow-Ups: