Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: label after recode


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: label after recode
Date   Thu, 20 Mar 2008 13:34:33 -0000

Naturally I support Svend's general -- and very Stataish -- stance that
Stata should make it 
difficult for you to stomp on your data. 

But equally it seems to me that the whole purpose of -recode- is to
change your data! 

This is what it says: 

"-recode- changes the values of numeric variables according to the rules
specified."

I am tempted to say -- which part of "changes the values" is unclear
here? 

Otherwise put, -recode- is already a sibling of -replace-. That is its
job. 

My guess is that -recode- divides the Stata world. There are probably
many users, 
for example, many sociologists using survey data -- for whom it is in
their top 10 commands. 
Having read in their data and had a quick look around, just about the
next thing is to
get into recoding. Presumably they get to learn exactly what -recode-
does and internalise
most of its somewhat idiosyncratic syntax. Many probably grew up on
similar commands in 
other packages. There are immigrants to Stataland whose third question
is probably 
"How do I recode?".  

There are probably many other users who use it only occasionally, and
depending on their 
preconceptions about what it should do, they may be surprised at what it
actually does. 
Chris seems surprised that the command does what it claims. Normally,
that is regarded as a feature, 
not a bug or limitation. 

For various reasons, principally my tendency to use continuous variables
much more, I rarely 
use -recode-. I would rather spell out a sequence of commands using
-generate-, -replace- and -label-. 
That is just a question of taste. 

An analogy is with -egen-. Many users, led by Bill Gould himself,
seemingly would rather work out 
from first principles the manipulations, typically involving a -sort-, a
-by:- and some fancy 
footwork with _n and _N, that are equivalent. Given fluency with basics,
that is much faster for them 
than finding out whether an appropriate -egen- function exists and
checking its precise syntax. 
Here I can sympathise readily with both camps, because I sometimes
commend the one-line solutions 
using -egen- for their simplicity and clarity and I sometimes commend
the first principles route 
as essential for generality and efficiency.

Nick 
n.j.cox@durham.ac.uk 


Svend Juul

Chris wrote:
 
in order to fit the distribution of wildtypes/genotypes in my 
population, I want to change the order of the values coded in
z1_gene_x.
 
I'm using the recode command:
 
. tab z1_gene_x
 
group(z1_ge |
ne)         |    Freq.    Percent      Cum.
------------+-----------------------------------
          G |        8       0.17      0.17
          T |    4,684      99.83    100.00
------------+-----------------------------------
      Total |    4,692     100.00
 
. recode z1_gene_x 1=2 2=1
(z1_gene_x: 4692 changes made)
 
. tab z1_gene_x
 
group(z1_ge |
ne)         |    Freq.    Percent      Cum.
------------+-----------------------------------
          G |   4,684       99.83     99.83
          T |       8        0.17    100.00
------------+-----------------------------------
      Total |   4,692      100.00
 
It seems like recode actually does what it is supposed to.
However, it does not change the label, what is somehow confusing.
According to the manual, recode supports the label option, 
but not for just keeping the label...
 
===============================================================
 
The manual and the online help says about -recode-'s -label()- 
option:
 
   label(name) specifies a name for the value label defined 
         from the transformation rules.  label() may be defined 
         only with generate() (or its synonym, into()) and
         prefix()...
 
You can define value labels within the -recode- command:
 
   . recode z1 (1=2 "G")(2=1 "T") , generate(z2) 
 
The label name becomes -z2-, unless you specify the -label()-
option:
 
   . recode z1 (1=2 "G")(2=1 "T") , generate(z2) label(z2lab)
 
Your example illustrates the danger with -recode- without the
-generate()- option: you may change the meaning of codes in a 
way that may lead to serious mistakes. I wish that -recode- 
required either a generate() option or a -replace- option; 
it would be in line with the safety precautions built into 
Stata with other commands.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index