[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: Re: st: < and > operand in recode
In a nutshell, Svend is right on fundamentals, my prejudice was
showing and StataCorp too are partly to blame.
Naturally, I'll explain.
As a long-time Stata user, I've been aware of -recode- for a while, but
evidently all the changes from its last major rewrite (about
two versions ago?) have yet to sink into my consciousness.
The main features of the help for -recode- in Stata 9
are visible, even if you have Stata <9, at
Incidentally, in answer to a supplementary question
from Brandy Water, this shows that -recode- is documented
in [D] in Stata 9, not in [R] as for some versions previous.
[D] is new in the Stata 9 documentation.
The intent of -recode- is, according to StataCorp, not me,
to "[r]ecode categorical variables", but it is also
prominent in the help that
"A range #1/#2 refers to all (real and integer) values between #1 and
#2, including the boundaries #1 and #2. This interpretation of #1/#2
differs from that in numlists."
So various interpretations are possible:
1. Svend and others who use -recode- for categorical variables can
also use it happily for continuous variables. This extra functionality,
as lucidly admired by Svend, is to them a considerable feature.
2. To save my face, I remain uneasy about a command ostensibly for
one purpose that extends to a quite different purpose based on
an idiosyncratic syntax. While the usefulness of -recode- is
considerable, its design is to me a little objectionable. Still, that
shouldn't matter to anyone unless your taste overlaps in this respect
with mine. I use -recode- much less than once a year, as I would much
rather than work out solutions using other commands than look at the
help and re-learn unfamiliar syntax.
Stepping back from this thread, I identify two levels of discussion:
* What ways exist to solve the problem.
* What will be most clear to others reading the code,
say the Stata user in question at some later date, or colleagues or
students who may want to read the code (and who are possibly not Stata
Svend's code, for example, is highly readable, except that the
reader must know from some source outside the code that
if intervals overlap, the first interval specification wins.
BW wanted to recode a variable:
<=1 coded 1 (<= i.e. meaning less than or equal to)
>1 to <=2 coded 2
>2 to <= 5 coded 3
>5 to <=10 coded 4
>10 coded 5
- and that gave rise to quite a few comments and useful suggestions.
However, I was surprised by Nick's dislike of -recode- which he wrote
was not meant to be used for continuous variables. To me, it works
perfectly, and I find it more transparent than any of the other
suggestions. For BW's example it is:
recode x (min/1=1)(1/2=2)(2/5=3)(5/10=4)(10/max=5) , generate(newx)
If intervals overlap, the first interval specification wins, so if you
want 1, 2, 5 and 10 in the upper groups, just reverse the sequence:
recode x (10/max=5)(5/10=4)(2/5=3)(1/2=2)(min/1=1) , generate(newx)
-recode- also lets you define value labels at once:
recode x (min/1=1 "(0-1]")(1/2=2 "(1-2]") ... , generate(newx)
So, my conclusion is that -recode- is VERY good to create grouped
variables from continuous variables.
As stated by others, the -generate()- option is extremely important. To
prevent accidents it might have been wise to require a -replace- option,
if the user wants to overwrite an existing variable. But that is
probably too late now.
* For searches and help try: