Home  /  Resources & support  /  FAQs  /  Making foreach go through all values of a variable

Is there a way to tell Stata to try all values of a particular variable in a foreach statement without specifying them?

Title   Making foreach go through all values of a variable
Author Nicholas J. Cox, Durham University, UK

foreach offers a way of repeating one or more Stata commands; see also [P] foreach. One common pattern is to cycle through all values of a classifying variable. Thus, with the auto data, we could cycle through all the values of foreign or rep78.

 . foreach i in 0 1 {
 .	whatever if foreign == `i'
 . }
 
 . foreach i of num 1/5 {
 . 	whatever if rep78 == `i'
 . }

In these examples, there is also a solution using forvalues; see also [P] forvalues.

 . forval i = 0/1 {
 . 	whatever if foreign == `i'
 . }
 
 . forval i = 1/5 {
 .	whatever if rep78 == `i'
 . }

Because foreach is in many ways more general (for example, it can be used to cycle through a set of string values), we will concentrate on it here.

The question asks how to go through all values without specifying them. In practice, this could be useful if you do not know all of the values at the time you type the command, you are writing code to be used with different sets of values, or you know the values but wish to avoid typing a long list. One common application is to use panel or other data with identifiers in some irregular sequence.

This FAQ covers three ways of doing it. The first method always works, the second method usually works, and the third method uses brute force but is occasionally defensible.

Method 1

This three-step process always works.

 . egen group = group(varname)

This first step maps the distinct values of varname to 1, 2, 3, and up to the number of distinct values.

 . su group, meanonly

Among other things, this second step leaves behind in r(max) the number of distinct values; see [R] summarize for details on saved results.

Then we use

 . foreach i of num 1/`r(max)' {
 .	whatever if group == `i'
 . }

In practice, it is simpler and more efficient to use forvalues instead:

 . forvalues i = 1/`r(max)' {
 .	whatever if group == `i'
 . }

In either case, the r-class result r(max) is used immediately after it was produced by summarize. As r-class results are ephemeral and tend not to persist, this is recommended.

That is a few lines of code, but this solution has two key advantages:

  • It works for all kinds of variables (integer, other numeric, and string).
  • It extends easily to the apparently much more difficult problem of going through all the distinct combinations of two or more variables, which is, in fact, only a little more difficult.
        . egen group = group(varlist)
    

Now the code is exactly the same as before. That is, instead of one varname, you just need to spell out a varlist in the argument to egen.

Method 2

The command levelsof is used to produce a list of the distinct values in a variable, which can be particularly useful when the variable is integer-valued or string-valued. (The corresponding Stata 8 command is levels.)

If, with the auto data, you type

 . levelsof rep78

Stata displays

        1 2 3 4 5

and also leaves behind those values in r(levels). You can also specify that a copy be placed in a local macro of your choice by typing

 . levelsof varname, local(levels) 
 . foreach l of local levels {
 .	whatever if varname == `l'
 . }

foreach cycles through all of the values fed to it within the local macro levels.

This method may not work well whenever the values of varname have fractional parts, such as 1.1 or 3.14159, as precision problems of the type documented at [U] 13.11 Precision and problems therein can cause problems. Fortunately, whenever people want to do this, it is usually to cycle through categories defined by integer or string codes.

Method 3

Suppose that you know some variable takes on most of the integers between 1 and 20 but not necessarily all of them, say, the number of children in families. You can make Stata try all those values and trap cases when there is no output.

 . foreach i of num 1/20 {
 . 	capture noisily whatever if nchildren == `i'
 . }

Here the capture command captures any instances in which the command would fail and crash the foreach loop. The noisily ensures that we still see output.

Once more, note the alternative solution with forvalues:

 . forval i = 1/20 {
 .	capture noisily whatever if nchildren == `i'
 . }

This method is crude, but it is a good practical solution whenever it is quicker for Stata to find out for itself which cases will not work than for you to puzzle out more careful code. It could be a good method if you knew that there were only a few gaps in a sequence.