[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
RE: st: Puzzled by behavior of -recode varlist (.a/.z=.)-
At 07:29 PM 4/23/2004 +0100, Nick Cox wrote:
You are right, this is not the problematic part of the code. But, I still
think it is a problem with -inrange-. I did a -set trace on- and part of
the output was this:
But .a etc. are being compared as string equivalents,
".a" etc., presumably because of this behaviour of -inrange()-
with numeric missings. And it makes sense to ask
whether some string lies between ".a" and ".z"
in ASCII order.
= replace __000003 = . if __000002==1 &
Now, I assume the intent of the latter part of the statement would be that,
if, say, var1 = .g, that part of the statement is true. But if in Stata I type
. dis inrange(float(.g),float(.a),float(.z))
i.e. inrange says .g does NOT FALL between .a and .z. Conversely, if in
Stata I type
. dis inrange(float(7),float(.a),float(.z))
i.e. Stata says that 7 DOES FALL between .a and .b! That is, inrange is
evaluating things exactly the opposite of the way it should be (or rather,
the way we want it to!)
Why? Here again are the docs for inrange:
"inrange(z,a,b) returns 1 if it is known that a <= z <= b; otherwise, this
function returns 0. If z = missing (.) this function returns 0. For numeric
arguments, if a = missing and/or b = missing these values are interpreted
as a = -infinity and/or b = +infinity, respectively."
So, according to these docs, (1) if z is missing, the statement always
returns false regardless of what a and b are (2) if a is missing and b is
missing, then the range is negative infinity to positive infinity, hence
any nonmissing value will fall in the range.
As a result, values that are missing will NOT get recoded -- and values
that are NOT missing will get recoded! -inrange- is performing exactly as
documented, but exactly the opposite happens of what was intended.
I bet this is a holdover from earlier Stata, where you only had one missing
value. With only one md value, it makes sense that specifying . as the
lower bound is an easy way of specifying negative infinity, and . as the
upper bound stands for positive infinity. But, once .a through .z got
added as MD values, inrange's operation didn't make as much
sense. Basically, with -inrange ANY missing value as the lower bound
stands for neg infinity, and ANY missing value for the upper bound stands
for positive infinity. Indeed it doesn't even matter what order the md
values are in, e.g.
. display inrange(7, .g, .a)
As Renzo noted, there is a workaround for his original problem, i.e. use
recode varlist (missing=.)
recode varlist (.a/.z=.)
But of course the user has to realize that the recode did not work as
intended! Plus, the workaround would be much more tedious if, say, you wanted
recode varlist (.a/.q=.)
i.e. you wanted to recode some md values but not all.
A pretty esoteric problem, but it would be nice if Stata could figure out
how to take care of it. Seems like -recode- and/or -inrange- could use
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
WWW (personal): http://www.nd.edu/~rwilliam
WWW (department): http://www.nd.edu/~soc
* For searches and help try: