[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: rounding down (and up) |

Date |
Sun, 22 Jun 2003 18:45:13 +0100 |

Here is a little nitty-gritty problem, and I do know Stata solutions. My interest is whether there are others I have missed, and above all in views on what is most natural as a solution, and what has fewest possible disadvantages or side-effects. (As a secondary detail I have a proposal for generalising two existing Stata functions.) I want to round down, in multiples of some (fixed) number. For concreteness, say I want to round -mpg- in the auto data in multiples of 5, so that any values 10-14 get rounded to 10, any values 15-19 to 15, etc. (-mpg- is simple in that only integer values occur; in many other cases we clearly have fractional parts to think about as well.) Note that the solution is _not_ the function call round(mpg, 5) as this rounds to the nearest multiple of 5, which could be either rounding up or rounding down: often useful, but not what I want here. round(mpg - 2.5, 5) seems all right, but also a little too much like a dodge. Similarly, the solution could be the function call -recode(-mpg,-40,-35,-30,-25,-20,-15,-10) but that's a bit backward for my taste. Note all the the negative signs in the above: negating and then negating to reverse it are made necessary by the fact that -recode()- uses its numeric arguments as upper limits, i.e. it rounds up. However, this is not the same as recode(mpg,15,20,25,30,40,45) - 5 as with the latter values of exactly 15 20 ... get mapped to 10 15 ... , again not what I want. recode(mpg,14,19,24,29,34,39,44) - 4 fixes that, but I find it a bit too much like thinking to have to work that out, especially on the fly, and it doesn't generalise easily to non-integers so far as I can see. (Subtract 4.9, or 4.99, etc. and you could run into precision problems.) -egen, cut()- offers another solution: egen ... = cut(mpg), at(10(5)45) Being able to specify a numlist is nice here, as compared with spelling out a comma-separated list, but you _must_ add a limit here (45) which will not be used; otherwise with egen ... = cut(mpg), at(10(5)40) your highest class will be missing (_not_ 40). There was some discussion of this behaviour on Statalist several months ago; although the original authors of -cut()- (Michael Hills and David Clayton) must have had a reason for implementing -cut()- in this manner, which was echoed in the adoption by Stata Corp, I don't find this behaviour intuitive. For some reason, I think of this 45 as like the piece of meat the hero(ine) has to throw to the guard dog to avoid being bitten (or worse). My favourite is none of these but 5 * floor(mpg/5) Here -floor()- always rounds down to the integer less than or equal to its argument. The name floor is due to Kenneth E. Iverson, the principal architect of APL, who introduced it some time before 1962. As it happens 5 * int(mpg/5) gives exactly the same result for -mpg- in the auto data, but in general whenever variables may be negative as well as positive, interval * floor(expression/interval) gives a more consistent classification. This solution needs a little thinking to appreciate, but grows on one, and it has the merit that you don't need to spell out all the possible end values (with the risk of forgetting some or mistyping some). (-recode()- and -egen, cut()- are not restricted to rounding in equal intervals and of course remain useful for more complicated problems.) Without recapitulating the whole argument insofar as it applies to rounding up, -floor()-'s sibling -ceil()- (short for _ceiling_) is a nice way of rounding up in equal intervals: interval * ceil(expression/interval) and is easier to work with than expressions based on -int()-. I have written -egen- functions -down()- and -up- for which the calls would be (e.g.) egen ... = down(mpg,5) but I incline to thinking that there is little pain and much gain in learning how to do it with -floor()- and -ceil()-. Any comments? Nick n.j.cox@durham.ac.uk P.S. my proposal is to generalise -floor()- so that it may take two arguments, in which case floor(expression, #) is # * floor(expression / #) and similarly for -ceil()-. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: -tabcount- updated on SSC** - Next by Date:
**Re: st: rounding down (and up)** - Previous by thread:
**st: -tabcount- updated on SSC** - Next by thread:
**Re: st: rounding down (and up)** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |