[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: RE: Taking averages, etc.
At 06:34 AM 12/17/2003 -0600, Fred Wolfe wrote:
Some of this
I have written about this previously, but I feel compelled to state this
again. Some of us may find it confusing, but I always do these kinds of
operations using nested cond() functions:
DO IF X1 = 1 and X2 = 3
Compute Y = 3.
ELSE IF X3 = 2 and X4 = 17
Compute Y = 4.
Compute Y = 5.
can be programmed into a cond() function.
gen Y =
cond(X1 == 1 & X2 == 3, 3,
cond(X3 == 2 & X4 == 17, 4,
In these constructs, I always place the next cond() into the third argument
(the "else" part) of the preceding cond(). It is efficient because it
truly functions as an if-then-else (but in terms of yielding a value rather
than directing the program flow). That is, at each step, you only have to
be concerned with the conditions not already mentioned. Contrast this with
the equivalent set of -replace- operations:
As Fred wrote:
gen y = 3 if X1==1 & X2==3
replace y = 4 if X1 !=1 & X2 !=3 & X3==2 & X4==17
replace y = 5 if X1 !=1 & X2 !=3 & X3 !=2 & X4 !=17
First, this needs a correction:
gen y = 3 if X1==1 & X2==3
replace y = 4 if (X1 !=1 | X2 !=3) & X3==2 & X4==17
replace y = 5 if (X1 !=1 | X2 !=3 | X3 !=2 | X4 !=17)
Here, at each subsequent -replace-, you must not only consider what
conditions to include, but which ones to exclude, so you don't overwrite
what came before. And you need to be very careful about specifying those
conditions correctly. Thus, you are reiterating conditions when you
program it, and re-testing these conditions when you run it.
Thus, I find the cond() construct simpler to program, and more efficient to
run -- than a set of -replace- operations.
The simplicity of understanding such a construct holds when I follow the
rule mentioned above: that you nest a cond() inside the third argument of
another cond(). Thus, the general pattern is...
gen y =
You *can* have more complex patterns, where you would put a cond() in the
second argument of another cond(), (or more commonly, having another cond()
in both the second and third arguments) and indeed these are much more
confusing to follow. I have done this, but rarely, and it requires much
greater care to understand it and to be certain that I have done it
correctly. In these situations, I would agree that the construct is confusing.
(There are even more weird possibilities that I have not even considered: a
cond() inside the first -- or the optional fourth -- argument of another
cond(). These would indeed be confusing.)
I should add that one reason that these constructs are sometimes hard to
follow is that the arguments are separated by the same token: the
comma. Thus in cond(a, b, c), the first comma corresponds to "then", and
the second comma corresponds to "else". So in a complex expression, it is
hard to tell whether a given comma is a "then" or an "else". The SPSS
equivalent is easier to follow because it uses the distinct tokens "then"
Finally, it is worth noticing what happens when some of the conditions are
gen y =
If there are some observations for which condition1 and condition2 are both
true, then the value will be value1 for those particular
observations. Thus, the earlier-occurring condition takes precedence.
But in a series of -replace- operations,
gen z = 13
replace z = 14 if condition_a
replace z = 15 if condition_b
then the later-occurring condition takes precedence (by overwriting the
I hope this is useful to some of you.
Institute for Policy Studies
Johns Hopkins University
* For searches and help try: