Re: st: creating a numeric matrix from string variables

Wed, 03 Jun 2009 16:59:23 -0500

Joe J <joe.stata@gmail.com> asked, > What I am trying to do is create a numeric matrix from, let's say, 3 > string variables. My data set provides [...] > [I] was wondering if Mata could help solve this ? Yes. Mata is a great way to solve this problem. Joe has data like, partner1 partner2 partner3 11A --- 12Z 12Z 21S 11K 14T 11A 12Z and from this, he wants to create data like, 11A 12Z 21S 11K 14T 11A 0 2 0 0 1 12Z 2 0 1 1 1 21S 0 1 0 1 0 11K 0 1 1 0 0 14T 1 1 0 0 0 The numbers in the matrix record the numbewr of pairs in the original data. Observations in the original data record agreements among companies. The companies are coded 11A, 12Z, etc. He wants a matrix recording the number of agreements between companies. I could do this entire problem using only Mata, but that would just be more work that is necessary. Mata really shows its power when used with Stata, and vice-versa. So I tooks Joe's original data and formed form it: . list +-----------------------------------------------+ | partner1 partner2 partner3 p1 p2 p3 | |-----------------------------------------------| 1. | 12Z 21S 11K 3 5 2 | 2. | 14T 11A 12Z 4 1 3 | 3. | 11A --- 12Z 1 . 3 | +-----------------------------------------------+ New variables are p1, p2, and p3 are just like partner 1, partner2, and partner3, except that I have assigned numeric codes to the companies. 1 is cokmpany 11A, 2 is 11K, and so on. Those numbers will become my row and column numbers in the Mata program I will write. Before getting into Mata, however, let me show you the Stata code that took the original data and added p1, p2, and p3: ------------------------------------------------------------------ input str3 (partner1 partner2 partner3) 11A --- 12Z 12Z 21S 11K 14T 11A 12Z end save long1 list gen id = _n reshape long partner, i(id) drop id _j drop if partner=="---" sort partner by partner: keep if _n==1 gen code = _n sum code save mapping program fixvar args oldvar newvar rename `oldvar' partner sort partner merge partner using mapping keep if _merge==1 | _merge==3 drop _merge rename partner `oldvar' rename code `newvar' end use long1, clear fixvar partner1 p1 fixvar partner2 p2 fixvar partner3 p3 list ------------------------------------------------------------------ I could write about the code, but I think it is self explanatory for anyone who wants to spend the time reading it. With that dataset, here's the Mata code to produce the desired matrix: ------------------------------------------------------------------ mata: real matrix agmat(real scalar N, string scalar varnames) { st_view(V, ., tokens(varnames)) A = J(N, N, 0) for (j=1; j<=rows(V); j++) { for (i1=1; i1<=cols(V); i1++) { for (i2=1; i2<=cols(V); i2++) { if (i1!=i2) { k1 = V[j, i1] k2 = V[j, i2] if (k1!=. & k2!=.) { A[k1,k2] = A[k1,k2] + 1 } } } } } return(A) } end ------------------------------------------------------------------ To produce the desired result, I then typed . mata: : agmat(5, "p1 p2 p3") [symmetric] 1 2 3 4 5 +---------------------+ 1 | 0 | 2 | 0 0 | 3 | 2 1 0 | 4 | 1 0 1 0 | 5 | 0 1 1 0 0 | +---------------------+ : end The code was reasonably straight forward. We started with the data, +-----------------------------------------------+ | partner1 partner2 partner3 p1 p2 p3 | |-----------------------------------------------| 1. | 12Z 21S 11K 3 5 2 | 2. | 14T 11A 12Z 4 1 3 | 3. | 11A --- 12Z 1 . 3 | +-----------------------------------------------+ Just look at the columns for p1, p2, and p3. We want to start with a 5x5 matrix A = 0, and then add 1 to the elements (starting at observation 1) (3,5), (3,2), (5,3), (5,2), (2,3), (2,5), and then we move onto observation 2, and so on. That's what the code does; going across observations, it takes every combination of pairs of p1, p2, and p3 and adds 1 to a 5x5 matrix that started out containing 0. That matrix is now in Mata. We could save it in a Mata variable by typing, : M = agmat(5, "p1 p2 p3") More likely, however, I'm guessing Joe will want to form a Stata dataset from the result. There are lots of ways Joe could do that, and nearly all of them are more clever than what I'm about to show you, but what follows is the easiest to understand: . mata: M = agmat(5, "p1 p2 p3") . drop _all . set obs 5 . gen f1 = 0 . gen f2 = 0 . gen f3 = 0 . gen f4 = 0 . gen f5 = 0 . mata: : st_view(V=., ., .) : V[.,.] = M : end The result of which is, . list +------------------------+ | f1 f2 f3 f4 f5 | |------------------------| 1. | 0 0 2 1 0 | 2. | 0 0 1 0 1 | 3. | 2 1 0 1 1 | 4. | 1 0 1 0 0 | 5. | 0 1 1 0 0 | +------------------------+ -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

