»  Home »  Resources & support »  FAQs »  Mata matrices larger than matsize in Stata

In Stata, can I use Mata matrices that are larger than Stata’s matsize?

How can I perform clustermat on a Mata matrix?

How can I send clustermat a matrix larger than the currently set matsize?

Title   Mata matrices larger than matsize in Stata
Author Jean Marie Linhart and Kenneth Higbee, StataCorp

You can save a Mata matrix exceeding Stata’s matsize to Stata.

 . mata: st_matrix("MyMatrix", J(12000,2,7.5))
 . mat dir

Matrix operators will not work on such large matrices. However, matrix functions and a few other commands work just fine:

 . local r = rowsof(MyMatrix)

 . local c = colsof(MyMatrix)

 . display "rows = `r'  cols = `c'"
 rows = 12000  cols = 2

 . display issymmetric(MyMatrix)

You can use Mata to define and then perform clustermat on a distance matrix. The size of this matrix can exceed Stata’s matsize. The following example shows how to do this.

What if Stata’s cluster command had no option for Euclidean (L2) distance? You could overcome this obstacle by creating an L2 distance matrix based on your data and then using clustermat to obtain the desired cluster analysis. However, what if the number of observations exceeds Stata’s matsize? Using regular Stata matrix commands, you would be unable to create a matrix large enough to accommodate your data.

Using Mata to create the distance matrix allows you to exceed Stata’s matsize and yet still use clustermat. Below you will see a Mata function that accepts as arguments a variable list and a matrix name for the L2 distance matrix it creates and stores back in Stata.

Here we create 810 observations on four variables (x1, x2, x3, and x4) and set Stata’s matsize to 200.

 . clear

 . set seed 12345

 . set obs 810
 obs was 0, now 810

 . forvalues i = 1/4 {
   2.         gen x`i' = invnormal(uniform())
   3. }

 . set matsize 200

We have 810 observations and our matsize is only 200. In Intercooled Stata, the maximum matsize value is 800; with 810 observations we appear to be stuck. How would we proceed? The answer is to use Mata. First we define the Mata function that will take our data and compute the L2 distance matrix.

 . mata:
 ------------------------------------- mata (type end to exit) ------
 : void function l2dist(string varlist, string Distmat)
 > {
 >   /* creates a matrix with name given by Distmat that is the 
 >    * L2 distance for the observations of the variables specified
 >    * in varlist.  All observations are used.
 >    */
 >         real matrix Dist
 >         real matrix Data
 >         V = st_varindex(tokens(varlist))
 >         Data = J(1,1,0)
 >         st_view(Data,.,V)
 >         Dist = J(rows(Data), rows(Data),0)
 >         for(i=1; i<=rows(Data); i++) {
 >                 for(j=1; j<=i; j++){
 >                         Dist[i,j] = sqrt(rowsum((Data[i,.]
 >                                                - Data[j,.]):^2))
 >                         Dist[j,i] = Dist[i,j]
 >                 }
 >         }
 >         st_matrix(Distmat, Dist)
 > }                       

 : end

Now we use this Mata function to compute the distance matrix and to store that as a Stata matrix.

 . mata: l2dist("x1 x2 x3 x4", "Dist")

Even though Stata’s matsize is currently set at 200, we were able to create an 810 × 810 matrix.

 . mat dir

And clustermat can successfully use the matrix.

 . clustermat single Dist, add
 cluster name: _cl_1





The Stata Blog: Not Elsewhere Classified Find us on Facebook Follow us on Twitter LinkedIn YouTube Instagram
© Copyright 1996–2020 StataCorp LLC   •   Terms of use   •   Privacy   •   Contact us