In Stata, can I use Mata matrices that are larger than Stata’s matsize?
How can I perform clustermat on a Mata matrix?
How can I send clustermat a matrix larger than the currently set
matsize?
|
Title
|
|
Mata matrices larger than matsize in Stata
|
|
Author
|
Jean Marie Linhart and Kenneth Higbee, StataCorp
|
|
Date
|
July 2006
|
You can save a Mata matrix exceeding Stata’s
matsize to Stata.
. mata: st_matrix("MyMatrix", J(12000,2,7.5))
. mat dir
MyMatrix[12000,2]
Matrix operators
will not work on such large matrices. However,
matrix functions
and a few other commands work just fine:
. local r = rowsof(MyMatrix)
. local c = colsof(MyMatrix)
. display "rows = `r' cols = `c'"
rows = 12000 cols = 2
. display issymmetric(MyMatrix)
0
You can use Mata to define and then perform
clustermat on
a distance matrix. The size of this matrix can exceed Stata’s
matsize. The following example shows how to do this.
What if Stata’s
cluster command
had no option for Euclidean (L2) distance? You could overcome this obstacle
by creating an L2 distance matrix based on your data and then using
clustermat to obtain the desired cluster analysis. However, what if
the number of observations exceeds Stata’s matsize? Using
regular Stata matrix commands, you would be unable to create a matrix large
enough to accommodate your data.
Using Mata to create the distance matrix allows you to exceed Stata’s
matsize and yet still use clustermat. Below you will see a
Mata function that accepts as arguments a variable list and a matrix name
for the L2 distance matrix it creates and stores back in Stata.
Here we create 810 observations on four variables (x1, x2,
x3, and x4) and set Stata’s matsize to 200.
. clear
. set seed 12345
. set obs 810
obs was 0, now 810
. forvalues i = 1/4 {
2. gen x`i' = invnormal(uniform())
3. }
. set matsize 200
We have 810 observations and our matsize is only 200. In Intercooled Stata,
the maximum matsize value is 800; with 810 observations we appear to
be stuck. How would we proceed? The answer is to use Mata. First we
define the Mata function that will take our data and compute the L2 distance
matrix.
. mata:
------------------------------------- mata (type end to exit) ------
: void function l2dist(string varlist, string Distmat)
> {
> /* creates a matrix with name given by Distmat that is the
> * L2 distance for the observations of the variables specified
> * in varlist. All observations are used.
> */
> real matrix Dist
> real matrix Data
>
> V = st_varindex(tokens(varlist))
> Data = J(1,1,0)
> st_view(Data,.,V)
> Dist = J(rows(Data), rows(Data),0)
> for(i=1; i<=rows(Data); i++) {
> for(j=1; j<=i; j++){
> Dist[i,j] = sqrt(rowsum((Data[i,.]
> - Data[j,.]):^2))
> Dist[j,i] = Dist[i,j]
> }
> }
> st_matrix(Distmat, Dist)
> }
: end
--------------------------------------------------------------------
Now we use this Mata function to compute the distance matrix and to store
that as a Stata matrix.
. mata: l2dist("x1 x2 x3 x4", "Dist")
Even though Stata’s matsize is currently set at 200, we were
able to create an 810 × 810 matrix.
. mat dir
Dist[810,810]
And clustermat can successfully use the matrix.
. clustermat single Dist, add
cluster name: _cl_1
|