[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Recode - a cautionary tale |

Date |
Thu, 17 Sep 2009 13:34:30 +0100 |

I agree with Allan that there's a cautionary tale here, but I am not completely sure what Allan thinks it is, so let me try to summarize. First, let's underline, especially given Allan's title, that the difficulties arose in using the -recode()- function, not the -recode- command. (Some users, presumably because of experience outside Stata and variations in terminology between software, are a little fuzzy on the difference between commands and functions.) Second, Allan's colleague got bit because gen byte LHS = <RHS> maps <RHS> that evaluates to 101 or more to missing. The punishment here, unfortunately, was that she was given what she asked for, namely a byte variable, with its own (documented) limits. (Stata's pretty weak on "Are you sure?" messages.) As this would have happened regardless of what the <RHS> was, singling out the -recode()- function is hardly the key issue. Incidentally, I would always prefer to round explicitly using -floor()- or -ceil()- because then I know without looking at any documentation -- and can control -- exactly what the limits are. (That "floor" means down and "ceil" means up is something I can carry in my head.) Thus 20 * floor(y/20) rounds down and 20 * ceil(y/20) rounds up, both in intervals of 20. However, it is easy to see that others may well prefer the flexibility of -recode()- or -egen, cut()-. Nick n.j.cox@durham.ac.uk Allan Reese (Cefas) A colleague used the recode function, following the example in [U]25.1.2. It reported some missing values, but she knew there were some missing items. Unfortunately some actual values also got recoded as missing. The command was: gen byte xcat = recode( x, 20, 40, 60, 80, 100, 120) and the missing values should have been 120. [U]12.2.2 lists the ranges for each numeric type, which for byte is -127 to +100, but does not specify what should happen when an out of range value is assigned. I've never had this problem because I'm too idle to save a few bytes by specifying the type. ;-) Tech support point out that if you don't force Stata to use a "byte" then it will gracefully detect the out of range values and automatically promote to the correct storage type. "But when you specify -generate byte- you are using the advanced syntax and telling Stata that you really want it to stay a byte no matter what values you pass it." In my opinion the advice in 25.1.2 is too Delphic, and the comment that "we (wisely) told Stata to generate the new variable as a byte" can be deleted. . clear . set obs 3 obs was 0, now 3 . generate byte x = _n . replace x = x + 200 x was byte now int (3 real changes made) . replace x = x + 40000 x was int now long (3 real changes made) . replace x = x + .5 x was long now double (3 real changes made) In giving advice, I had been thinking of the recode command rather than the function: the command makes it easier to handle end intervals with min/max. Another option is egen using cut() which also allows substitution of integer codes labelled with the cutpoint values. Using icodes makes it less likely the byte storage will be overflowed. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Recode - a cautionary tale***From:*"Allan Reese (Cefas)" <allan.reese@cefas.co.uk>

- Prev by Date:
**st: AW: RE: AW: Count special characters** - Next by Date:
**st: areg vs xi reg vs xtreg vs what else?** - Previous by thread:
**st: Recode - a cautionary tale** - Next by thread:
**st: treatreg references** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |