Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: Re: st: working with percentile cutpoints


From   n j cox <n.j.cox@durham.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: Re: st: working with percentile cutpoints
Date   Tue, 22 May 2007 20:58:39 +0100

Stop Kit Baum killing things, even vegetables

Here's another way. First, some specifications
for a good approach to this problem:

1. Control over the definition of percentile point.
You may not like wired-in definitions, and there
are several with slightly different rationales.
(As the old joke goes, the great thing about
standards is that there are so many to choose
from.)

2. Works sensibly with ties.

3. Adapts sensibly to the presence of missing values.

4. Can be applied -by()- or -by:-.

That sounds a tough call, but all are possible.
Most of the explanation is already set out at

How can I calculate percentile ranks?
http://www.stata.com/support/faqs/stat/pcrank.html

to which -search percentile- points.

sysuse auto
egen n = count(mpg), by(foreign)              (a)
egen rank = rank(mpg), by(foreign)            (b)
gen pcp = 100 * (rank - 0.5) / n              (c)

gen class = cond(pcp < 13, 1,
            cond(pcp < 35, 2,
	    cond(pcp < 73, 3,
                           4))) if mpg < .    (d)

In terms of (1), the recipe in (c) is just one of
many, and you can subsitute your own.

In terms of (2), the -egen, rank()- function used
in (b) automatically adjusts for ties.

In terms of (3), the -egen- functions just leave
out missing values, but the command in (d) is
careful not to include them accidentally.  	

In terms of (3), (b) and (c) show how to do it.

The use of -cond()- in (d) shows an alternative
to Brent's code. That said, his code is crystal
clear and that's a good thing. Not everyone likes
-cond()-. I didn't like it much until I ended writing
a tutorial on it with David Kantor, after which it
came to seem natural. -search cond()- will yield
the reference.

Moreover, I do think that Uli Kohler's function
-egen, xtile()-, and indeed -egenmore-, is a good thing.

Nick
n.j.cox@durham.ac.uk


Svend Juul
==========

The -egenmore- -xtile()- function does it. You may need first to
    ssc install egenmore

. sysuse auto , clear
. egen pricegrp = xtile(price) , percentiles(13 35 73)
. tab1 pricegrp

-> tabulation of pricegrp

   pricegrp |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         10       13.51       13.51
          2 |         16       21.62       35.14
          3 |         29       39.19       74.32
          4 |         19       25.68      100.00
------------+-----------------------------------
      Total |         74      100.00

Brent Fulton
============

I am not sure how to do it in one step, but you could do the following:

xtile temp =  var1, nq(100)   /* creates new variable of integers with range
of 1 to 100 indicating percentiles */
gen new_var1=.
replace new_var1=1 if inrange(temp,1,12)
replace new_var1=2 if inrange(temp,13,34)
replace new_var1=3 if inrange(temp,35,72)
replace new_var1=4 if inrange(temp,73,100

Paul Visintainer
================

Is there a way to create a new variable that uses cutpoints based on
unique percentiles of distribution?  For example, suppose I want a new
variable with 4 groups at (or as close as the distribution will allow)
the following percentile cutpoints:  13th percentile, 35th percentile,
and the 73rd percentile?  I've looked at -xtile-, -pctile-, and -egen-
and they don't seem to allow for this option.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index