How do I implement SAS-like ARRAYs in Stata?
|
Title
|
|
Implementing SAS-like ARRAYs in Stata
|
|
Author
|
William Gould, StataCorp
|
|
Date
|
April 1999; updated February 2003
|
SAS provides an ARRAY facility, and
whether Stata provides an analogy is a popular question on both our help
line and Statalist. There is an analogy, but it
is going to take some explaining.
First, let us agree on a problem: I have a list of variables—say,
mpg, weight, and displ—and I want to do something to each of them.
Just to fix ideas, let's pretend that I want to add 1 to each. Thus one
solution is
. replace mpg = mpg + 1
. replace weight = weight + 1
. replace displ = displ + 1
That would not be a bad solution if I really did have three variables, but I
am using three as an example, and I want you to pretend that I had 100
variables.
If I really wanted to add 1 to each of these variables, I could use
foreach:
. foreach var of varlist mpg weight displ {
replace ‘var’ = ‘var’ + 1
}
foreach has a pretty powerful syntax so, using some other dataset, I
could compactly refer to my 100 variables:
. foreach var of varlist x1-x20 pop* d57 {
replace ‘var’ = ‘var’ + 1
}
For this example, foreach seems most appropriate, but sometimes
a while
loop is best. Inside a program I might have the following code:
while "‘1’" != "" {
replace ‘1’ = ‘1’ + 1
macro shift
}
In the above, ‘1’ stands for the variable, and I can
refer to it as often as I want just as I did with ‘var’
in the foreach loops above. In my example, I refer to
‘1’ twice: replace ‘1’ = ‘1’
+ 1, meaning add 1 to the variable, but that is just my example, and
really the replace statement stands for a block of code that does
something complicated to the variable.
There are other ways I could code the while loop, such as
local i = 1
while "‘‘i’’" != "" {
replace ‘‘i’’ = ‘‘i’’ + 1
local i = ‘i’ + 1
}
This second method avoids macro shift and is faster.
Whichever way I write it, I need to somehow get Stata to understand that my
list is “mpg weight displ”. Here is a complete, working program
that you may find useful:
--------------------- BEGIN --- array.ado --- CUT HERE ---
program array
version 9.0
gettoken usrprog 0 : 0
syntax varlist
foreach var of local varlist {
‘usrprog’ ‘var’
}
end
---------------------- END --- array.ado --- CUT HERE ---
Using the utility, I could solve my problem with
. program add1
1. replace ‘1’ = ‘1’ + 1
2. end
or, alternatively, with
. program add1
1. args var
2. replace ‘var’ = ‘var’ + 1
3. end
and then
. array add1 mpg weight displ
To use array, I type array, followed by the name of a program
to do something to one variable, followed by a list of variables on which I
want the program run. Using my other more-than-100 variable dataset, I
could type
. array add1 x1-x20 pop* d57
There are two steps to using array:
- Write another program—enter it interactively, in your do-file, or
however—that does whatever it is you want to do to one variable.
Call the program what you will. In writing your program, type
‘1’ (open-single-quote, one, close-single-quote) any
place you want to refer to the variable. Or, instead, use the args
command to name your argument.
- Type array, followed by the name of your program, and
whatever variables you want your program run on.
Solution 2
You do not have to be dependent on array. Writing your own, custom
program is pretty easy. To solve my add-1-to-mpg-weight-and-displ problem,
I could write
. foreach var of varlist mpg weight displ {
. replace ‘var’ = ‘var’ + 1
. }
Or, less elegantly as
. tokenize mpg weight displ
. while "‘1’" != "" {
. replace ‘1’ = ‘1’ + 1
. macro shift
. }
These are really much more SAS-like solutions. I write a custom program,
not a general one.
This last program I want you to understand thoroughly. First, let me give
you some background on Stata.
- Stata has macros. Macros are one thing standing for another. Macros
have names, and that is how you refer to them. I might have a macro named
bill. Macros have contents. Macro bill might contain
“mpg weight displ”. When I type bill, that just
means bill. When I type
open-single-quote-bill-close-single-quote, however, that means
“the contents of the macro named bill”.
- Macros can be named bill, bob, mary, var,
....
- Stata also has numbered macros. Their names are 1, 2,
3, .... ‘1’ refers to the contents of the
macro named 1, ‘2’ to the contents of the macro
named 2, and so on.
- The numbered macros are sometimes called positional macros.
- The tokenize whatever command fills in the positional
macros. If I type tokenize mpg weight displ, Stata sets macro 1
to contain “mpg”, 2 to contain “weight”, and 3 to
contain “displ”.
That is how macros work. Now let us look at our less elegant program again
and understand it:
(1) . tokenize mpg weight displ
(2) . while "‘1’" != "" {
(3) . replace ‘1’ = ‘1’ + 1
(4) . macro shift
(5) . }
- Look at line 1. All we are doing is putting “mpg” in 1,
“weight” in 2, and “displ” in 3.
- Look at line 3: replace ‘1’ = ‘1’ + 1.
Initially ‘1’ is mpg, so Stata sees (and so should
you) replace mpg = mpg + 1.
- Look at line 4: macro shift. macro shift shifts the
numbered macros; it shifts 1 into the waste bin, 2 into
1, 3 into 2, and so on. So, now
‘1’ is “weight” and ‘2’
is “displ”.
- Loop back to line 2: while "‘1’" !=
"". Visualize the line as Stata sees it. The first time
through, ‘1’ was mpg, and the line read while
"mpg" != "". The string "mpg"
was not equal to "", so Stata executed the loop, adding
1 to variable mpg. This time, ‘1’ is weight. Since
"weight" is not equal to "", Stata will
execute the loop again, this time adding 1 to the variable weight. The
third time, ‘1’ will be displ. "displ"
is not equal to "", and Stata will again execute the
loop.
The fourth time through, ‘1’ will be
Well, that is how Stata sees it because ‘1’ will
substitute to nothing. Is "" not equal to
""? No, they are equal, so the loop will stop.
There are other ways I could write program soln, and here is one that
uses while but avoids using macro shift:
(1) . local array "mpg weight displ"
(2) . local i = 1
(3) . local n : word count ‘array’
(4) . while ‘i’ <= ‘n’ {
(5) . local var : word ‘i’ of ‘array’
(6) . replace ‘var’ = ‘var’ + 1
(7) . local i = ‘i’ + 1
(8) . }
The only new thing here is my use of word ‘i’ of
‘array’ in line 5, and you can probably guess what it does.
Make the substitutions. The first time through the loop, line 5 reads
local var : word 1 of mpg weight displ
because ‘i’ is 1 and ‘array’ is "mpg weight
displ" (sans quotes). Word 1 of "mpg weight displ" is mpg,
and so mpg is stored in the macro var.
This second solution is a little longer than the previous one, but it has
the advantage that I can generalize it to work with paired arrays. For
example,
. local array1 "mpg weight displ"
. local array2 "rep78 hdroom trunk"
. local i = 1
. local n : word count ‘array1’
. while ‘i’ <= ‘n’ {
. local var1 : word ‘i’ of ‘array1’
. local var2 : word ‘i’ of ‘array2’
. replace ‘var1’ = ‘var1’ + ‘var2’
. local i = ‘i’ + 1
. }
Summary
Let the macro named array contain a list of variable names. For instance,
local array "mpg weight displ foreign"
The extended macro function word of will pull the ith word
from the array. For instance, let macro i contain one of integers 1, 2, 3,
or 4. Then
local x : word ‘i’ of ‘array’
places the ‘i’th word of ‘array’ into the macro
named x. If i contains 3,
local i = 3
and then
local x : word ‘i’ of ‘array’
places "displ" in x. In subsequent code, you can use
‘x’ to refer to displ.
You can refer to multiple "arrays" simultaneously:
local array1 "mpg weight displ"
local array2 "foreign length turn make"
...
local i = 1
...
local j = 3
...
local x : word ‘i’ of ‘array1’
local y : word ‘j’ of ‘array2’
...
... ‘x’ ... ‘y’
In the above, referring to ‘x’ and ‘y’ is equivalent
to referring to the selected variable names, and you may use ‘x’
and ‘y’ in any way that you would use a variable name. For
example, since Stata variables can be explicitly subscripted—because
turn[3] refers to the 3rd observation on variable turn—you can type
‘y’[3] to refer to the 3rd observation of the ‘j’th
element of array2.
You can form matrices of variable names should that be desirable. Here is a
4 x 3 example:
local arow1 "mpg weight displ"
local arow2 "turn gratio foreign"
local arow3 "rep78 hdroom trunk"
local arow4 "length price make"
...
/* the following obtains a[3,2], namely hdroom: */
local x : word 2 of ‘arow3’
...
/* the following obtains a[‘i’,‘j’]: */
local x : word ‘j’ of ‘arow‘i’’
To summarize, to define an M-element array vector named array, type
local array "varname1 varname2 ... varnameM"
To refer to array[i], type
local x : word ‘i’ of ‘array’
and then refer to ‘x’.
To define an N x M array matrix named matrix, type
local matrix1 "varname11 varname12 ... varname1M"
local matrix2 "varname21 varname22 ... varname2M"
...
local matrixN "varnameN1 varnameN2 ... varnameNM"
To refer to matrix[i,j], type
local x : word ‘j’ of ‘matrix‘i’’
and then refer to ‘x’.
|