  # Re: st: RE: Taking averages, etc.

 From David Kantor To statalist@hsphsun2.harvard.edu Subject Re: st: RE: Taking averages, etc. Date Wed, 17 Dec 2003 13:24:01 -0500

```At 06:34 AM 12/17/2003 -0600, Fred Wolfe wrote:

Some of this

```
```DO IF X1 = 1 and X2 = 3
Compute Y = 3.
ELSE IF X3 = 2 and X4 = 17
Compute Y = 4.
ELSE
Compute Y = 5.
END IF.
```
```can be programmed into  a cond() function.
```
I have written about this previously, but I feel compelled to state this again. Some of us may find it confusing, but I always do these kinds of operations using nested cond() functions:

#delimit ;
gen Y =
cond(X1 == 1 & X2 == 3, 3,
cond(X3 == 2 & X4 == 17, 4,
5));

In these constructs, I always place the next cond() into the third argument (the "else" part) of the preceding cond(). It is efficient because it truly functions as an if-then-else (but in terms of yielding a value rather than directing the program flow). That is, at each step, you only have to be concerned with the conditions not already mentioned. Contrast this with the equivalent set of -replace- operations:

As Fred wrote:

gen y = 3 if X1==1 & X2==3
replace y = 4 if X1 !=1 & X2 !=3 & X3==2 & X4==17
replace y = 5 if X1 !=1 & X2 !=3 & X3 !=2 & X4 !=17

First, this needs a correction:

gen y = 3 if X1==1 & X2==3
replace y = 4 if (X1 !=1 | X2 !=3) & X3==2 & X4==17
replace y = 5 if (X1 !=1 | X2 !=3 | X3 !=2 | X4 !=17)

Here, at each subsequent -replace-, you must not only consider what conditions to include, but which ones to exclude, so you don't overwrite what came before. And you need to be very careful about specifying those conditions correctly. Thus, you are reiterating conditions when you program it, and re-testing these conditions when you run it.

Thus, I find the cond() construct simpler to program, and more efficient to run -- than a set of -replace- operations.

The simplicity of understanding such a construct holds when I follow the rule mentioned above: that you nest a cond() inside the third argument of another cond(). Thus, the general pattern is...

gen y =
cond(condition1, value1,
cond(condition2, value2,
cond(condition3, value3,
...
final_value )))...);

You *can* have more complex patterns, where you would put a cond() in the second argument of another cond(), (or more commonly, having another cond() in both the second and third arguments) and indeed these are much more confusing to follow. I have done this, but rarely, and it requires much greater care to understand it and to be certain that I have done it correctly. In these situations, I would agree that the construct is confusing.

(There are even more weird possibilities that I have not even considered: a cond() inside the first -- or the optional fourth -- argument of another cond(). These would indeed be confusing.)

I should add that one reason that these constructs are sometimes hard to follow is that the arguments are separated by the same token: the comma. Thus in cond(a, b, c), the first comma corresponds to "then", and the second comma corresponds to "else". So in a complex expression, it is hard to tell whether a given comma is a "then" or an "else". The SPSS equivalent is easier to follow because it uses the distinct tokens "then" and "else".

Finally, it is worth noticing what happens when some of the conditions are overlapping.

gen y =
cond(condition1, value1,
cond(condition2, value2,
value3));

If there are some observations for which condition1 and condition2 are both true, then the value will be value1 for those particular observations. Thus, the earlier-occurring condition takes precedence.

But in a series of -replace- operations,
gen z = 13
replace z = 14 if condition_a
replace z = 15 if condition_b

then the later-occurring condition takes precedence (by overwriting the earlier value).

I hope this is useful to some of you.
-- David

David Kantor
Institute for Policy Studies
Johns Hopkins University
dkantor@jhu.edu
410-516-5404

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/