Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Puzzled by behavior of -recode varlist (.a/.z=.)-


From   Richard Williams <[email protected]>
To   [email protected]
Subject   Re: st: Puzzled by behavior of -recode varlist (.a/.z=.)-
Date   Fri, 23 Apr 2004 23:58:44 -0500

At 12:53 PM 4/23/2004 -0400, Renzo Comolli wrote:
Dear Statalist,

I don't understand the behavior of
. recode varlist (.a/.z=.)
One other tidbit to what I had before: Stata 7's -recode- command did not use -inrange-. As a result, Stata 7 and Stata 8 can sometimes produce different results for a -recode- command. Example:

Stata 7:
. list

var1
1. 1
2. 2
3. 4
4. 7
5. 8
6. 9
7. 12
8. .
9. .
10. .

. recode var1 9/. = 9
(4 changes made)

. list

var1
1. 1
2. 2
3. 4
4. 7
5. 8
6. 9
7. 9
8. 9
9. 9
10. 9

The researcher presumably wanted all values that were 9 or greater (where greater includes missing) to be recoded to 9. That is what happened. But in Stata 8, using the exact same data and syntax,

Stata 8:

. recode var1 9/. = 9
(var1: 1 changes made)

. list

+------+
| var1 |
|------|
1. | 1 |
2. | 2 |
3. | 4 |
4. | 7 |
5. | 8 |
|------|
6. | 9 |
7. | 9 |
8. | . |
9. | . |
10. | . |
+------+


Stata 8 interpreted the recode command as saying that any nonmissing values between 9 and positive infinity should be recoded as 9. This is probably not what the researcher intended, i.e. the researcher wanted the 3 missing cases recoded too; but if the researcher did want what actually happened, he or she could have done it with the command

recode var1 9/max = 9

Now, my guess is that Stata 8's recode behavior will only cause problems under rare and esoteric conditions (although I don't think the above is all that unlikely a scenario). Nonetheless, this behavior strikes me as a bug, and at a minimum it should be documented. Renzo's original command, recode varlist (.a/.z=.), winds up being equivalent to -recode varlist (min/max=.)- which does exactly the opposite of what he intended and is certainly not what people would expect.

Also, -inrange-'s behavior (treating missing as being either negative infinity or positive infinity) was ok when there was only one possible value for missing; but now that Stata has a range of MD values it is not so logical.

Whether you can fix these quirks in -recode- and -inrange- without creating other problems, I don't know.

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index