Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: rounding down (and up)


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: rounding down (and up)
Date   Sun, 22 Jun 2003 18:45:13 +0100

Here is a little nitty-gritty problem, 
and I do know Stata solutions. My interest 
is whether there are others I have missed, 
and above all in views on what is most 
natural as a solution, and what has fewest 
possible disadvantages or side-effects. (As a secondary 
detail I have a proposal for generalising 
two existing Stata functions.) 

I want to round down, in multiples of 
some (fixed) number. For concreteness, say 
I want to round -mpg- in the auto data 
in multiples of 5, so that any values 
10-14 get rounded to 10, any values 15-19 
to 15, etc. (-mpg- is simple in that 
only integer values occur; in many other 
cases we clearly have fractional parts to think 
about as well.)  

Note that the solution is _not_ the function call 

	round(mpg, 5) 

as this rounds to the nearest multiple 
of 5, which could be either rounding up 
or rounding down: often useful, but 
not what I want here. 

	round(mpg - 2.5, 5) 

seems all right, but also a little too 
much like a dodge. 

Similarly, the solution could be the function call 

	-recode(-mpg,-40,-35,-30,-25,-20,-15,-10) 

but that's a bit backward for my taste. 
Note all the the negative signs in the above: 
negating and then negating to reverse it are made necessary 
by the fact that -recode()- uses 
its numeric arguments as upper limits, 
i.e. it rounds up. However, this is not the same as 

	recode(mpg,15,20,25,30,40,45) - 5 

as with the latter values of exactly 15 20 ... 
get mapped to 10 15 ... , again not what 
I want. 

	recode(mpg,14,19,24,29,34,39,44) - 4

fixes that, but I find it a bit too 
much like thinking to have to work that 
out, especially on the fly, and it doesn't 
generalise easily to non-integers so far 
as I can see. (Subtract 4.9, or 4.99, etc. 
and you could run into precision problems.) 

-egen, cut()- offers another solution:  

	egen ... = cut(mpg), at(10(5)45) 

Being able to specify a numlist is nice here, 
as compared with spelling out a comma-separated
list, but you _must_ add a limit here (45) which 
will not be used; otherwise with 

	egen ... = cut(mpg), at(10(5)40) 

your highest class will be missing (_not_ 40). 
There was some discussion of this behaviour 
on Statalist several months ago; although the original 
authors of -cut()- (Michael Hills and David 
Clayton) must have had a reason for implementing 
-cut()- in this manner, which was echoed in the 
adoption by Stata Corp, I don't find this behaviour
intuitive. 

For some reason, I think of this 45 as like 
the piece of meat the hero(ine) has to throw 
to the guard dog to avoid being bitten (or 
worse). 

My favourite is none of these but 

	5 * floor(mpg/5) 

Here -floor()- always rounds down to the integer 
less than or equal to its argument. The name floor 
is due to Kenneth E. Iverson, the principal architect 
of APL, who introduced it some time before 1962. 
As it happens 

	5 * int(mpg/5) 

gives exactly the same result for -mpg- in the auto 
data, but in general whenever variables may be 
negative as well as positive, 

	interval * floor(expression/interval) 

gives a more consistent classification. 

This solution needs a little thinking to appreciate,  
but grows on one, and it has the merit that you don't need to 
spell out all the possible end values (with the risk 
of forgetting some or mistyping some). (-recode()- 
and -egen, cut()- are not restricted to rounding 
in equal intervals and of course remain useful for 
more complicated problems.) 

Without recapitulating the whole argument insofar 
as it applies to rounding up, -floor()-'s sibling 
-ceil()- (short for _ceiling_) is a nice way 
of rounding up in equal intervals: 

	interval * ceil(expression/interval) 

and is easier to work with than expressions 
based on -int()-. 

I have written -egen- functions -down()- 
and -up- for which the calls would be (e.g.) 

	egen ... = down(mpg,5) 

but I incline to thinking that there is 
little pain and much gain in learning 
how to do it with -floor()- and -ceil()-. 

Any comments? 

Nick 
n.j.cox@durham.ac.uk 

P.S. my proposal is to generalise 
-floor()- so that it may take two 
arguments, in which case 

floor(expression, #)

is 

# * floor(expression / #) 

and similarly for -ceil()-. 


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index