Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: a note about cut()

From   Rebecca Pope <>
Subject   st: a note about cut()
Date   Mon, 7 Jan 2013 16:53:06 -0600

This post is just to (re-)alert other users to something that bit me today.

The Stata 12 help for -egen cut()- says that cut() will create a new
variable "coded with the left-hand ends of the grouping intervals"
where the values are given by at(). However, the help is incomplete in
one crucial respect.

As acknowledged in a post by Bill Gould from 2002:
"In adopting -cut()-, StataCorp wrote its own inelegant description of
the function, and in that description, one would not suspect that
nonmissing values above the final cutpoint would be mapped to
missing." (

The reasons for cut()'s behavior were discussed by Bill in that post.

*If* you are aware of this behavior, it's easy to fool cut() into
keeping the highest nonmissing values in the new variable by making
the last number in the at() list a high number that doesn't exist in
your data. I think it is also good practice to chose one that cannot
logically exist for the variable even if you get new data, e.g. age in
years of 999. This will of course present challenges for certain
variables that do not have a natural upper limit, e.g. income. In
response to a similar post from 2010, Nick Cox also proposed -ceil()-
and -floor()- as a solution: Finally,
with a small number of integers, -recode, generate()- is also an

I hope this is helpful.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index