Stop Kit Baum killing things, even vegetables
Here's another way. First, some specifications
for a good approach to this problem:
1. Control over the definition of percentile point.
You may not like wired-in definitions, and there
are several with slightly different rationales.
(As the old joke goes, the great thing about
standards is that there are so many to choose
from.)
2. Works sensibly with ties.
3. Adapts sensibly to the presence of missing values.
4. Can be applied -by()- or -by:-.
That sounds a tough call, but all are possible.
Most of the explanation is already set out at
How can I calculate percentile ranks?
http://www.stata.com/support/faqs/stat/pcrank.html
to which -search percentile- points.
sysuse auto
egen n = count(mpg), by(foreign) (a)
egen rank = rank(mpg), by(foreign) (b)
gen pcp = 100 * (rank - 0.5) / n (c)
gen class = cond(pcp < 13, 1,
cond(pcp < 35, 2,
cond(pcp < 73, 3,
4))) if mpg < . (d)
In terms of (1), the recipe in (c) is just one of
many, and you can subsitute your own.
In terms of (2), the -egen, rank()- function used
in (b) automatically adjusts for ties.
In terms of (3), the -egen- functions just leave
out missing values, but the command in (d) is
careful not to include them accidentally.
In terms of (3), (b) and (c) show how to do it.
The use of -cond()- in (d) shows an alternative
to Brent's code. That said, his code is crystal
clear and that's a good thing. Not everyone likes
-cond()-. I didn't like it much until I ended writing
a tutorial on it with David Kantor, after which it
came to seem natural. -search cond()- will yield
the reference.
Moreover, I do think that Uli Kohler's function
-egen, xtile()-, and indeed -egenmore-, is a good thing.
Nick
n.j.cox@durham.ac.uk
Svend Juul
==========
The -egenmore- -xtile()- function does it. You may need first to
ssc install egenmore
. sysuse auto , clear
. egen pricegrp = xtile(price) , percentiles(13 35 73)
. tab1 pricegrp
-> tabulation of pricegrp
pricegrp | Freq. Percent Cum.
------------+-----------------------------------
1 | 10 13.51 13.51
2 | 16 21.62 35.14
3 | 29 39.19 74.32
4 | 19 25.68 100.00
------------+-----------------------------------
Total | 74 100.00
Brent Fulton
============
I am not sure how to do it in one step, but you could do the following:
xtile temp = var1, nq(100) /* creates new variable of integers with range
of 1 to 100 indicating percentiles */
gen new_var1=.
replace new_var1=1 if inrange(temp,1,12)
replace new_var1=2 if inrange(temp,13,34)
replace new_var1=3 if inrange(temp,35,72)
replace new_var1=4 if inrange(temp,73,100
Paul Visintainer
================
Is there a way to create a new variable that uses cutpoints based on
unique percentiles of distribution? For example, suppose I want a new
variable with 4 groups at (or as close as the distribution will allow)
the following percentile cutpoints: 13th percentile, 35th percentile,
and the 73rd percentile? I've looked at -xtile-, -pctile-, and -egen-
and they don't seem to allow for this option.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/