Home  /  Resources & support  /  FAQs  /  Implementing SAS-like ARRAYs in Stata

How do I implement SAS-like ARRAYs in Stata?

Title   Implementing SAS-like ARRAYs in Stata
Author William Gould, StataCorp

SAS provides an ARRAY facility, and whether Stata provides an analogy is a popular question on both our help line and Statalist. There is an analogy, but it is going to take some explaining.

First, let us agree on a problem: I have a list of variables—say, mpg, weight, and displ—and I want to do something to each of them. Just to fix ideas, let's pretend that I want to add 1 to each. Thus one solution is

        . replace mpg = mpg + 1 
        . replace weight = weight + 1 
        . replace displ = displ + 1

That would not be a bad solution if I really did have three variables, but I am using three as an example, and I want you to pretend that I had 100 variables.

If I really wanted to add 1 to each of these variables, I could use foreach:

        . foreach var of varlist mpg weight displ {
                  replace `var' = `var' + 1
          }

foreach has a pretty powerful syntax so, using some other dataset, I could compactly refer to my 100 variables:

        . foreach var of varlist x1-x20 pop* d57 {
                  replace `var' = `var' + 1
          }

For this example, foreach seems most appropriate, but sometimes a while loop is best. Inside a program I might have the following code:

        while "`1'" != "" {
                replace `1' = `1' + 1
                macro shift 
        }

In the above, `1' stands for the variable, and I can refer to it as often as I want just as I did with `var' in the foreach loops above. In my example, I refer to `1' twice: replace `1' = `1' + 1, meaning add 1 to the variable, but that is just my example, and really the replace statement stands for a block of code that does something complicated to the variable.

There are other ways I could code the while loop, such as

       local i = 1 
       while "``i''" != "" {
               replace ``i'' = ``i'' + 1
               local i = `i' + 1
       }

This second method avoids macro shift and is faster.

Whichever way I write it, I need to somehow get Stata to understand that my list is “mpg weight displ”. Here is a complete, working program that you may find useful:

 --------------------- BEGIN --- array.ado --- CUT HERE ---
  program array
          version 9.0
          gettoken usrprog 0 : 0
          syntax varlist
          foreach var of local varlist {
                  `usrprog' `var'
          }
  end
  ---------------------- END --- array.ado --- CUT HERE ---

Using the utility, I could solve my problem with

        . program add1
          1. replace `1' = `1' + 1
          2. end

or, alternatively, with

        . program add1
          1. args var
          2. replace `var' = `var' + 1
          3. end

and then

        . array add1 mpg weight displ

To use array, I type array, followed by the name of a program to do something to one variable, followed by a list of variables on which I want the program run. Using my other more-than-100 variable dataset, I could type

        . array add1 x1-x20 pop* d57

There are two steps to using array:

  1. Write another program—enter it interactively, in your do-file, or however—that does whatever it is you want to do to one variable. Call the program what you will. In writing your program, type `1' (open-single-quote, one, close-single-quote) any place you want to refer to the variable. Or, instead, use the args command to name your argument.
  2. Type array, followed by the name of your program, and whatever variables you want your program run on.

Solution 2

You do not have to be dependent on array. Writing your own, custom program is pretty easy. To solve my add-1-to-mpg-weight-and-displ problem, I could write

        . foreach var of varlist mpg weight displ {
        .          replace `var' = `var' + 1
        . }

Or, less elegantly as

        . tokenize mpg weight displ 
        . while "`1'" != "" {
        .          replace `1' = `1' + 1
        .          macro shift
        . }

These are really much more SAS-like solutions. I write a custom program, not a general one.

This last program I want you to understand thoroughly. First, let me give you some background on Stata.

  1. Stata has macros. Macros are one thing standing for another. Macros have names, and that is how you refer to them. I might have a macro named bill. Macros have contents. Macro bill might contain “mpg weight displ”. When I type bill, that just means bill. When I type open-single-quote-bill-close-single-quote, however, that means “the contents of the macro named bill”.
  2. Macros can be named bill, bob, mary, var, ....
  3. Stata also has numbered macros. Their names are 1, 2, 3, .... `1' refers to the contents of the macro named 1, `2' to the contents of the macro named 2, and so on.
  4. The numbered macros are sometimes called positional macros.
  5. The tokenize whatever command fills in the positional macros. If I type tokenize mpg weight displ, Stata sets macro 1 to contain “mpg”, 2 to contain “weight”, and 3 to contain “displ”.

That is how macros work. Now let us look at our less elegant program again and understand it:

        (1)     . tokenize mpg weight displ 
        (2)     . while "`1'" != "" {
        (3)     .          replace `1' = `1' + 1
        (4)     .          macro shift
        (5)     . }
  1. Look at line 1. All we are doing is putting “mpg” in 1, “weight” in 2, and “displ” in 3.
  2. Look at line 3: replace `1' = `1' + 1. Initially `1' is mpg, so Stata sees (and so should you) replace mpg = mpg + 1.
  3. Look at line 4: macro shift. macro shift shifts the numbered macros; it shifts 1 into the waste bin, 2 into 1, 3 into 2, and so on. So, now `1' is “weight” and `2' is “displ”.
  4. Loop back to line 2: while "`1'" != "". Visualize the line as Stata sees it. The first time through, `1' was mpg, and the line read while "mpg" != "". The string "mpg" was not equal to "", so Stata executed the loop, adding 1 to variable mpg. This time, `1' is weight. Since "weight" is not equal to "", Stata will execute the loop again, this time adding 1 to the variable weight. The third time, `1' will be displ. "displ" is not equal to "", and Stata will again execute the loop.

    The fourth time through, `1' will be

    Well, that is how Stata sees it because `1' will substitute to nothing. Is "" not equal to ""? No, they are equal, so the loop will stop.

There are other ways I could write program soln, and here is one that uses while but avoids using macro shift:

        (1)          . local array "mpg weight displ"
        (2)          . local i = 1
        (3)          . local n : word count `array'
        (4)          . while `i' <= `n' {
        (5)          .         local var : word `i' of `array'
        (6)          .         replace `var' = `var' + 1
        (7)          .         local i = `i' + 1
        (8)          . }

The only new thing here is my use of word `i' of `array' in line 5, and you can probably guess what it does. Make the substitutions. The first time through the loop, line 5 reads

        local var : word 1 of mpg weight displ

because `i' is 1 and `array' is "mpg weight displ" (sans quotes). Word 1 of "mpg weight displ" is mpg, and so mpg is stored in the macro var.

This second solution is a little longer than the previous one, but it has the advantage that I can generalize it to work with paired arrays. For example,

        . local array1 "mpg   weight displ"
        . local array2 "rep78 hdroom trunk"
        . local i = 1
        . local n : word count `array1'
        . while `i' <= `n' {
        .         local var1 : word `i' of `array1'
        .         local var2 : word `i' of `array2'
        .         replace `var1' = `var1' + `var2'
        .         local i = `i' + 1
        . }

Summary

Let the macro named array contain a list of variable names. For instance,

        local array "mpg weight displ foreign"

The extended macro function word of will pull the ith word from the array. For instance, let macro i contain one of integers 1, 2, 3, or 4. Then

        local x : word `i' of `array'

places the `i'th word of `array' into the macro named x. If i contains 3,

        local i = 3 

and then

        local x : word `i' of `array'

places "displ" in x. In subsequent code, you can use `x' to refer to displ.

You can refer to multiple "arrays" simultaneously:

        local array1 "mpg weight displ"
        local array2 "foreign length turn make"
        ...
        local i = 1
        ...
        local j = 3
        ...
        local x : word `i' of `array1'
        local y : word `j' of `array2'
        ...
        ... `x' ... `y'

In the above, referring to `x' and `y' is equivalent to referring to the selected variable names, and you may use `x' and `y' in any way that you would use a variable name. For example, since Stata variables can be explicitly subscripted—because turn[3] refers to the 3rd observation on variable turn—you can type `y'[3] to refer to the 3rd observation of the `j'th element of array2.

You can form matrices of variable names should that be desirable. Here is a 4 x 3 example:

        local arow1 "mpg weight displ"
        local arow2 "turn gratio foreign"
        local arow3 "rep78 hdroom trunk"
        local arow4 "length price make"
        ...
        /* the following obtains a[3,2], namely hdroom: */
        local x : word 2 of `arow3'
        ...
        /* the following obtains a[`i',`j']: */
        local x : word `j' of `arow`i''

To summarize, to define an M-element array vector named array, type

        local array "varname1 varname2 ... varnameM"

To refer to array[i], type

        local x : word `i' of `array'

and then refer to `x'.

To define an N x M array matrix named matrix, type

        local matrix1 "varname11 varname12 ... varname1M"
        local matrix2 "varname21 varname22 ... varname2M"
        ...
        local matrixN "varnameN1 varnameN2 ... varnameNM"

To refer to matrix[i,j], type

        local x : word `j' of `matrix`i''

and then refer to `x'.