How can I perform

How can I send

Title | Mata matrices larger than matsize in Stata | |

Author | Jean Marie Linhart and Kenneth Higbee, StataCorp |

You can save a Mata matrix exceeding Stata’s
**matsize** to Stata.

. mata: st_matrix("MyMatrix", J(12000,2,7.5)) . mat dirMyMatrix[12000,2]

Matrix operators will not work on such large matrices. However, matrix functions and a few other commands work just fine:

. local r = rowsof(MyMatrix) . local c = colsof(MyMatrix) . display "rows = `r' cols = `c'"rows = 12000 cols = 2. display issymmetric(MyMatrix)0

You can use Mata to define and then perform
**clustermat** on
a distance matrix. The size of this matrix can exceed Stata’s
**matsize**. The following example shows how to do this.

What if Stata’s
**cluster** command
had no option for Euclidean (L2) distance? You could overcome this obstacle
by creating an L2 distance matrix based on your data and then using
**clustermat** to obtain the desired cluster analysis. However, what if
the number of observations exceeds Stata’s **matsize**? Using
regular Stata matrix commands, you would be unable to create a matrix large
enough to accommodate your data.

Using Mata to create the distance matrix allows you to exceed Stata’s
**matsize** and yet still use **clustermat**. Below you will see a
Mata function that accepts as arguments a variable list and a matrix name
for the L2 distance matrix it creates and stores back in Stata.

Here we create 810 observations on four variables (**x1**, **x2**,
**x3**, and **x4**) and set Stata’s **matsize** to 200.

. clear . set seed 12345 . set obs 810obs was 0, now 810. forvalues i = 1/4 { 2. gen x`i' = invnormal(uniform()) 3. } . set matsize 200

We have 810 observations and our matsize is only 200. In Intercooled Stata,
the maximum **matsize** value is 800; with 810 observations we appear to
be stuck. How would we proceed? The answer is to use Mata. First we
define the Mata function that will take our data and compute the L2 distance
matrix.

. mata:------------------------------------- mata (type end to exit) ------: void function l2dist(string varlist, string Distmat)>{> /* creates a matrix with name given by Distmat that is the > * L2 distance for the observations of the variables specified > * in varlist. All observations are used. > */ >real matrix Dist>real matrix Data> >V = st_varindex(tokens(varlist))>Data = J(1,1,0)>st_view(Data,.,V)>Dist = J(rows(Data), rows(Data),0)>for(i=1; i<=rows(Data); i++) {>for(j=1; j<=i; j++){>Dist[i,j] = sqrt(rowsum((Data[i,.]>- Data[j,.]):^2))>Dist[j,i] = Dist[i,j]>}>}>st_matrix(Distmat, Dist)>} : end--------------------------------------------------------------------

Now we use this Mata function to compute the distance matrix and to store that as a Stata matrix.

. mata: l2dist("x1 x2 x3 x4", "Dist")

Even though Stata’s **matsize** is currently set at 200, we were
able to create an 810 × 810 matrix.

. mat dirDist[810,810]

And **clustermat** can successfully use the matrix.

. clustermat single Dist, addcluster name: _cl_1