Title | Mata matrices larger than Stata's maximum matrix size | |
Author | Jean Marie Linhart and Kenneth Higbee, StataCorp |
You can save a Mata matrix exceeding Stata’s maximum matrix size.
. mata: st_matrix("MyMatrix", J(12000,2,7.5)) . mat dir MyMatrix[12000,2]
Matrix operators will not work on such large matrices. However, matrix functions and a few other commands work just fine:
. local r = rowsof(MyMatrix) . local c = colsof(MyMatrix) . display "rows = `r' cols = `c'" rows = 12000 cols = 2 . display issymmetric(MyMatrix) 0
You can use Mata to define and then perform clustermat on a distance matrix. The size of this matrix can exceed Stata’s maximum matrix size. The following example shows how to do this.
What if Stata’s cluster command had no option for Euclidean (L2) distance? You could overcome this obstacle by creating an L2 distance matrix based on your data and then using clustermat to obtain the desired cluster analysis. However, what if the number of observations exceeds Stata’s maximum matrix size? Using regular Stata matrix commands, you would be unable to create a matrix large enough to accommodate your data.
Using Mata to create the distance matrix allows you to exceed Stata’s maximum matrix size and yet still use clustermat. Below you will see a Mata function that accepts as arguments a variable list and a matrix name for the L2 distance matrix it creates and stores back in Stata.
Here we create 810 observations on four variables (x1, x2, x3, and x4).
. clear . set seed 12345 . set obs 810 obs was 0, now 810 . forvalues i = 1/4 { 2. gen x`i' = invnormal(uniform()) 3. }
We have 810 observations. In Intercooled Stata, the maximum matrix size is 800; with 810 observations we appear to be stuck. How would we proceed? The answer is to use Mata. First we define the Mata function that will take our data and compute the L2 distance matrix.
. mata: ------------------------------------- mata (type end to exit) ------ : void function l2dist(string varlist, string Distmat) > { > /* creates a matrix with name given by Distmat that is the > * L2 distance for the observations of the variables specified > * in varlist. All observations are used. > */ > real matrix Dist > real matrix Data > > V = st_varindex(tokens(varlist)) > Data = J(1,1,0) > st_view(Data,.,V) > Dist = J(rows(Data), rows(Data),0) > for(i=1; i<=rows(Data); i++) { > for(j=1; j<=i; j++){ > Dist[i,j] = sqrt(rowsum((Data[i,.] > - Data[j,.]):^2)) > Dist[j,i] = Dist[i,j] > } > } > st_matrix(Distmat, Dist) > } : end --------------------------------------------------------------------
Now we use this Mata function to compute the distance matrix and to store that as a Stata matrix.
. mata: l2dist("x1 x2 x3 x4", "Dist")
Even though the maximum matrix size in Intercooled Stata is 800, we were able to create an 810 × 810 matrix.
. mat dir Dist[810,810]
And clustermat can successfully use the matrix.
. clustermat single Dist, add cluster name: _cl_1