**[M-5] asarray()** -- Associative arrays

__Syntax__

*A* **=** **asarray_create(**[ *keytype* *declare A *
[, *keydim*
[, *minsize*
[, *minratio*
[, *maxratio* ]]]]]**)**

**asarray(***A***,** *key***,** *a***)** *A[key] = a *

*a* **=** **asarray(***A***,** *key***)** *a = A[key] or a = notfound*

**asarray_remove(***A***,** *key***)** *delete A[key] if it exists*

*bool* **=** **asarray_contains(***A***,** *key***)** *A[key] exists? *

*N* **=** **asarray_elements(***A***)** *# of elements in A *

*keys* **=** **asarray_keys(***A***)** *all keys in A *

*loc* **=** **asarray_first(***A***)** *location of first element *
*or NULL *

*loc* **=** **asarray_next(***A***,** *loc***)** *location of next element *
*or NULL *

*key* **=** **asarray_key(***A***,** *loc***)** *key at loc *

*a* **=** **asarray_contents(***A***,** *loc***)** *contents a at loc *

**asarray_notfound(***A***,** *notfound***)** *set notfound value *

*notfound* **=** **asarray_notfound(***A***)** *query notfound value *

where

*A:* Associative array *A[key]*. Created by **asarray_create()** and
passed to the other functions. If *A* is declared, it is
declared *transmorphic*.

*keytype:* Element type of keys; **"string"**, **"real"**, **"complex"**, or
**"pointer"**. Optional; default **"string"**.

*keydim:* Dimension of key; 1 <= *keydim* <= 50. Optional; default 1.

*minsize:* Initial size of hash table used to speed locating keys in
*A*; *real scalar*; 5 <= *minsize* <= 1,431,655,764. Optional;
default 100.

*minratio:* Fraction filled at which hash table is automatically
downsized; *real scalar*; 0 <= *minratio* <= 1. Optional;
default 0.5.

*maxratio:* Fraction filled at which hash table is automatically
upsized; *real scalar*; 1 < *maxratio* <= **.** (meaning
infinity). Optional; default 1.5.

*key:* Key under which an element is stored in *A*; *string scalar*
by default; type and dimension are declared using
**asarray_create()**.

*a:* Element of *A*; *transmorphic*; may be anything of any size;
different *A[key]* elements may have different types of
contents.

*bool:* Boolean logic value; *real scalar*; equal to 1 or 0 meaning
true or false.

*N:* Number of elements stored in *A*; *real scalar*; 0 <= *N* <=
2,147,483,647.

*keys:* List of all keys that exist in *A*. Each row is a key.
Thus *keys* is a *string colvector* if keys are *string*
*scalars*, a *string matrix* if keys are *string vectors*, a
*real colvector* if keys are *real scalars*, etc. Note that
**rows(***keys***)** = *N*.

*loc:* A location in *A*; *transmorphic*. The first location is
returned by **asarray_first()**, subsequent locations by
**asarray_next()**. *loc***==NULL** when there are no more
elements.

*notfound:* Value returned by **asarray(***A***,** *key***)** when *key* does not exist
in *A*. *notfound* **=** **J(0,0,.)** by default.

__Description__

**asarray()** provides one- and multi-dimensional associative arrays, also
known as containers, maps, dictionaries, indices, and hash tables.

Also see **[M-5] AssociativeArray()** for a class-based interface into the
functions documented here.

__Remarks__

Before writing a program using **asarray()**, you should try it
interactively. Remarks are presented under the following headings:

Detailed description
Example 1: Scalar keys and scalar contents
Example 2: Scalar keys and matrix contents
Example 3: Vector keys and scalar contents; sparse matrix
Setting the efficiency parameters

__Detailed description__

In associative arrays, rather than being dense integers, the indices can
be anything, even strings. So you might have *A["Frank Smith"]* equal to
something and *A["Mary Jones"]* equal to something else. In Mata, you
write that as **asarray(***A***,** **"Frank Smith",*** something***)** and **asarray(***A***,** **"Mary**
**Jones",*** somethingelse***)** to define the elements and **asarray(***A***,** **"Frank**
**Smith")** and **asarray(***A***,** **"Mary Jones")** to obtain their values.

*A* = **asarray_create()** declares (creates) an associative array. The
function allows arguments, but they are optional. Without arguments,
**asarray_create()** declares an associative array with string scalar keys,
corresponding to the *A["Frank Smith"]* and *A["Mary Jones"]* example above.

*A* = **asarray_create(***keytype***,** *keydim***)** declares an associative array with
*keytype* keys each of dimension 1 *x* *keydim*. **asarray_create("string",** **1)**
is equivalent to **asarray_create()** without arguments.
**asarray_create("string",** **2)** declares the keys to be string, as before,
but now they are 1 *x* 2 rather than 1 *x* 1, so array elements would be of
the form *A["Frank Smith", "Mary Jones"]*. *A["Mary Jones", "Frank Smith"]*
would be a different element. **asarray_create("real",** **2)** declares the
keys to be real 1 *x* 2, which would somewhat correspond to our ordinary
idea of a matrix, namely *A[i,j]*. The difference would be that to store,
say, *A[100,980]*, it would not be necessary to store the interior
elements, and in addition to storing *A[100,980]*, we could store
*A[3.14159,2.71828*].

**asarray_create()** has three more optional arguments: *minsize*, *minratio*,
and *maxratio*. We recommend that you do not specify them. They are
discussed in *Setting the efficiency parameters* under **Remarks** below.

**asarray(***A***,** *key***,** *a***)** sets or resets element *A[key]* = *a*. Note that if you
declare *key* to be 1 *x* 2, you must use the parentheses vector notation to
specify key literals, such as **asarray(***A***,** **(100,980),** **2.15)**.
Alternatively, if **k** = **(100,980)**, then you can omit the parentheses in
**asarray(***A***,** **k,** **2.15)**.

**asarray(***A***,** *key***)** returns element *A[key]* or it returns *notfound* if the
element does not exist. By default, *notfound* is **J(0,0,.)**, but you can
change that using **asarray_notfound()**. If you redefined *notfound* to be **0**
and defined keys to be real 1 *x* 2, you would be on your way to recording
sparse matrices efficiently.

**asarray_remove(***A***,** *key***)** removes *A[key]*, or it does nothing if *A[key]* is
already undefined.

**asarray_contains(***A***,** *key***)** returns 1 if *A[key]* is defined, and it returns 0
otherwise.

**asarray_elements(***A***)** returns the number of elements stored in *A*.

**asarray_keys(***A***)** returns a vector or matrix containing all the keys, one
to a row. The keys are not in alphabetical or numerical order. If you
want them that way, code **sort(asarray_keys(***A***),** **1)** if your keys are
scalar, or in general, code **sort(asarray_keys(***A***),** *idx***)**; see **[M-5] sort()**.

**asarray_first(***A***)** and **asarray_next(***A***,** *loc***)** provide a way of obtaining the
names one at a time. Code

**for (loc=asarray_first(***A***);** **loc!=NULL;** **loc=asarray_next(***A***, loc)) {**
...
**}**

**asarray_key(***A***,** *loc***)** and **asarray_contents(***A***,** *loc***)** return the key and
contents at *loc*, so the loop becomes

**for (loc=asarray_first(***A***);** **loc!=NULL;** **loc=asarray_next(***A***, loc)) {**
...
... **asarray_key(***A***, loc)** ...
...
... **asarray_contents(***A***, loc)** ...
...
**}**

**asarray_notfound(***A***,** *notfound***)** defines what **asarray(***A***,** *key***)** returns when
the element does not exist. By default, *notfound* is **J(0,0,.)**, which is
to say, a 0 *x* 0 real matrix. You can reset *notfound* at any time.
**asarray_notfound(***A***)** returns the current value of *notfound*.

__Example 1: Scalar keys and scalar contents__

: A = asarray_create()
: asarray(A, "bill", 1.25)
: asarray(A, "mary", 2.75)
: asarray(A, "dan", 1.50)
: asarray(A, "bill")
1.25

: asarray(A, "mary")
2.75

: asarray(A, "mary", 3.25)
: asarray(A, "mary")
3.25

: sum = 0
: for (loc=asarray_first(A); loc!=NULL; loc=asarray_next(A, loc)) {
sum = sum + asarray_contents(A, loc)
}
: sum
6

: sum/asarray_elements(A)
2

__Example 2: Scalar keys and matrix contents__

: A = asarray_create()
: asarray(A, "Count", (1,2\3,4))
: asarray(A, "Hilbert", Hilbert(3))
: asarray(A, "Count")
1 2
+---------+
1 | 1 2 |
2 | 3 4 |
+---------+

: asarray(A, "Hilbert")
[symmetric]
1 2 3
+-------------------------------------------+
1 | 1 |
2 | .5 .3333333333 |
3 | .3333333333 .25 .2 |
+-------------------------------------------+

__Example 3: Vector keys and scalar contents; sparse matrix__

: A = asarray_create("real", 2)
: asarray_notfound(A, 0)
: asarray(A, ( 1, 1), 1)
: asarray(A, (1000, 999), .5)
: asarray(A, (1000, 1000), 1)
: asarray(A, (1000, 1001), .5)
: asarray(A, (1,1))
1

: asarray(A, (2,2))
0

: // one way to get the trace:
: trace = 0
: for (i=1; i<=1000; i++) trace = trace + asarray(A, (i,i))
: trace
2

: // another way to get the trace
: trace = 0
: for (loc=asarray_first(A); loc!=NULL; loc=asarray_next(A, loc)) {
index = asarray_key(A, loc)
if (index[1]==index[2]) {
trace = trace + asarray_contents(A, loc)
}
}
: trace
2

__Setting the efficiency parameters__

The syntax **asarray_create()** is

*A* **=** **asarray_create(***keytype***,** *keydim***,** *minsize***,** *minratio***,** *maxratio***)**

All arguments are optional. The first two specify the characteristics of
the key and their use has already been illustrated. The last three are
efficiency parameters. In most circumstances, we recommend you do not
specify them. The default values have been chosen to produce reasonable
execution times with reasonable memory consumption.

**asarray()** works via hash tables. Say we wish to record *n* entries. The
idea is to allocate a hash table of *N* rows, where *N* can be less than,
equal to, or greater than *n*. When one needs to find the element
corresponding to a key, one calculates a function of the key, called a
hash function, that returns an integer *h* from 1 to *N*. One first looks in
row *h*. If row *h* is already in use and the keys are different, we have a
collision. In that case, we have to allocate a duplicates list for the
*h*th row and put the duplicate keys and contents into that list.
Collisions are bad because, when they occur, **asarray()** has to allocate a
duplicates list, requiring both time and memory, though it does not
require much. When fetching results, if row *h* has a duplicates list,
**asarray()** has to search the list, which it does sequentially, and that
takes extra time, too. Hash tables work best when collisions happen
rarely.

Obviously, collisions are certain to occur if *N* < *n*. Note, however, that
although performance suffers, the method does not break. A hash table of
*N* can hold any number of entries, even if *N* < *n*.

Performance depends on details of implementation. We have examined the
behavior of **asarray()** and discovered that collisions rarely occur when
*n*/*N* <= 0.75. When *n*/*N* = 1.5, performance suffers, but not by as much as
you might expect. Around *n*/*N* = 2, performance degrades considerably.

When you add or remove an element, **asarray()** examines *n*/*N* and considers
rebuilding the table with a larger or smaller *N*; it rebuilds the table
when *n*/*N* is large to preserve efficiency. It rebuilds the table when *n*/*N*
is small to conserve memory. Rebuilding the table is a
computer-intensive operation, and so should not be performed too often.

In making these decisions, **asarray()** uses three parameters:

*maxratio:* When *n*/*N* >= *maxratio*, the table is upsized to *N* = 1.5**n*.

*minratio:* When *n*/*N* <= *minratio*/1.5, the table is downsized to *N* =
1.5**n*. (For an exception, see *minsize*.)

*minsize:* If the new *N* < 1.5**minsize*, the table is downsized to *N* =
1.5**minsize* if it is not already that size.

The default values of the three parameters are 1.5, 0.5, and 100. You
can reset them, though you are unlikely to improve on the default values
of *minratio* and *maxratio*.

You can improve on *minsize* when you know the number of elements that will
be in the table and that number is greater than 100. For instance, if
you know the table will contain at least 1,000 elements, starting *minsize*
at 1,000, which implies *N* = 1,500, will prevent two rescalings, namely,
from 150 to 451, and from 451 to 1,354. This saves a little time.

You can also turn off the resizing features. Setting *minratio* to 0 turns
off downsizing. Setting *maxratio* to **.** (missing) turns off upsizing. You
might want to turn off both downsizing and upsizing if you set *minsize*
sufficiently large for your problem.

We would never recommend turning off upsizing alone, and we seldom would
recommend turning off downsizing alone. In a program where it is known
that the array will exist for only a short time, however, turning off
downsizing can be efficient. In a program where the array might exist
for a considerable time, turning off downsizing is dangerous because then
the array could only grow (and probably will).

__Conformability__

**asarray_create(***keytype***,** *keydim***,** *minsize***,** *minratio***,** *maxratio***)**:
*keytype*: 1 *x* 1 (optional)
*keydim*: 1 *x* 1 (optional)
*minsize*: 1 *x* 1 (optional)
*minratio*: 1 *x* 1 (optional)
*maxratio*: 1 *x* 1 (optional)
*result*: *transmorphic*

**asarray(***A***,** *key***,** *a***)**:
*A*: *transmorphic*
*key*: 1 *x* *keydim*
*a*: *r_key x c_key*
*result*: *void*

**asarray(***A***,** *key***)**:
*A*: *transmorphic*
*key*: 1 *x* *keydim*
*result*: *r_key x c_key*

**asarray_remove(***A***,** *key***)**:
*A*: *transmorphic*
*key*: 1 *x* *keydim*
*result*: *void*

**asarray_contains(***A***,** *key***)**, **asarray_elements(***A***,** *key***)**:
*A*: *transmorphic*
*key*: 1 *x* *keydim*
*result*: 1 *x* 1

**asarray_keys(***A***,** *key***)**:
*A*: *transmorphic*
*key*: 1 *x* *keydim*
*result*: *n* *x* *keydim*

**asarray_first(***A***)**:
*A*: *transmorphic*
*result*: *transmorphic*

**asarray_first(***A***,** *loc***)**:
*A*: *transmorphic*
*loc*: *transmorphic*
*result*: *transmorphic*

**asarray_key(***A***,** *loc***)**:
*A*: *transmorphic*
*loc*: *transmorphic*
*result*: 1 *x* *keydim*

**asarray_contents(***A***,** *loc***)**:
*A*: *transmorphic*
*loc*: *transmorphic*
*result*: *r_key* *x* *c_key*

**asarray_notfound(***A***,** *notfound***)**:
*A*: *transmorphic*
*notfound*: *r* *x* *c*
*result*: *void*

**asarray_notfound(***A***)**:
*A*: *transmorphic*
*result*: *r* *x* *c*

__Diagnostics__

None.

__Source code__

asarray.mata