Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: How SHOULD cut behave


From   Michael Hills <mhills@blueyonder.co.uk>
To   stata list <statalist@hsphsun2.harvard.edu>
Subject   st: How SHOULD cut behave
Date   Thu, 8 Aug 2002 22:18:21 +0100

After the flurry of crossing posts on this topic, finally put to bed
by Bill Gould's very clear reply, perhaps it is worth airing the
question of how cut SHOULD behave.

In the original version, the result of tabulating newvar after

egen newvar=cut(oldvar),at(25(5)45)

was

25-
30-
35-
40-

and as I understand it, the complaint was that the numbers
25,30,35,40,45 are described as left hand end-points so that strictly
the output of tabulate should be

25-
30-
35-
40-
45-

in which the last group contains all non-missing values of var. I
confess that I don't like this, as I would have to exclude 45- from
all following work with newvar. Also, why is there not a <25 group,
which you might expect if there is a >45 group? 

Perhaps (as I think Jens suggested) the output

25-
30-
35-
40-45

would satisfy all parties. Only observations in [25,45) are included
and 45 is a not-included right-hand end. All observations outside
[25,45) are coded as missing on newvar.

The output

[25-30)
[30-35)
[35-40)
[40-45)

would be even better, but the mathematician's convention that [25-30)
includes 25 but not 30 is not recognized in medicine and probably
not in economics either. 

-- 
Michael Hills

mhills@blueyonder.co.uk


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index