Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Puzzled by behavior of -recode varlist (.a/.z=.)-


From   Richard Williams <Richard.A.Williams.5@nd.edu>
To   statalist@hsphsun2.harvard.edu, <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Puzzled by behavior of -recode varlist (.a/.z=.)-
Date   Fri, 23 Apr 2004 19:44:08 -0500

At 07:29 PM 4/23/2004 +0100, Nick Cox wrote:
But .a etc. are being compared as string equivalents,
".a" etc., presumably because of this behaviour of -inrange()-
with numeric missings. And it makes sense to ask
whether some string lies between ".a" and ".z"
in ASCII order.

Nick
n.j.cox@durham.ac.uk
You are right, this is not the problematic part of the code. But, I still think it is a problem with -inrange-. I did a -set trace on- and part of the output was this:

= replace __000003 = . if __000002==1 & inrange(float(var1),float(.a),float(.z))

Now, I assume the intent of the latter part of the statement would be that, if, say, var1 = .g, that part of the statement is true. But if in Stata I type

. dis inrange(float(.g),float(.a),float(.z))
0

i.e. inrange says .g does NOT FALL between .a and .z. Conversely, if in Stata I type

. dis inrange(float(7),float(.a),float(.z))
1

i.e. Stata says that 7 DOES FALL between .a and .b! That is, inrange is evaluating things exactly the opposite of the way it should be (or rather, the way we want it to!)

Why? Here again are the docs for inrange:

"inrange(z,a,b) returns 1 if it is known that a <= z <= b; otherwise, this function returns 0. If z = missing (.) this function returns 0. For numeric arguments, if a = missing and/or b = missing these values are interpreted as a = -infinity and/or b = +infinity, respectively."

So, according to these docs, (1) if z is missing, the statement always returns false regardless of what a and b are (2) if a is missing and b is missing, then the range is negative infinity to positive infinity, hence any nonmissing value will fall in the range.

As a result, values that are missing will NOT get recoded -- and values that are NOT missing will get recoded! -inrange- is performing exactly as documented, but exactly the opposite happens of what was intended.

I bet this is a holdover from earlier Stata, where you only had one missing value. With only one md value, it makes sense that specifying . as the lower bound is an easy way of specifying negative infinity, and . as the upper bound stands for positive infinity. But, once .a through .z got added as MD values, inrange's operation didn't make as much sense. Basically, with -inrange ANY missing value as the lower bound stands for neg infinity, and ANY missing value for the upper bound stands for positive infinity. Indeed it doesn't even matter what order the md values are in, e.g.

. display inrange(7, .g, .a)
1

As Renzo noted, there is a workaround for his original problem, i.e. use

recode varlist (missing=.)

instead of

recode varlist (.a/.z=.)

But of course the user has to realize that the recode did not work as intended! Plus, the workaround would be much more tedious if, say, you wanted

recode varlist (.a/.q=.)

i.e. you wanted to recode some md values but not all.

A pretty esoteric problem, but it would be nice if Stata could figure out how to take care of it. Seems like -recode- and/or -inrange- could use some modification.




-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
FAX: (574)288-4373
HOME: (574)289-5227
EMAIL: Richard.A.Williams.5@ND.Edu
WWW (personal): http://www.nd.edu/~rwilliam
WWW (department): http://www.nd.edu/~soc

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index