Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: How does Stata calculate percentiles?

From   Nick Cox <>
To   "''" <>
Subject   RE: st: How does Stata calculate percentiles?
Date   Sun, 24 Oct 2010 17:42:23 +0100

Phil gave a good answer. However, once you have the -centile- result in memory, 

local cutpoint=r(c_1)
recode price (min/`cutpoint'=0) (`cutpoint'/max=1), gen(pricecat)

can be replaced by one line 

gen pricecat = (price >= r(c_1)) if !missing(price)

which yields 0, 1 and numeric missing as appropriate.

On the other hand, why do you want to throw away information like this? 


Phil Clayton

One way to do it would be to obtain the centile using the -centile- command, then -recode- the variable to create the indicator variable.

sysuse auto
centile price, centile(33)
local cutpoint=r(c_1)
recode price (min/`cutpoint'=0) (`cutpoint'/max=1), gen(pricecat)

If you wanted the indicator variable to be 1 if the variable is >= the cutpoint (as opposed to >), swap the two recoding rules (once one rule is matched, the subsequent rules are ignored).

See the manual for -centile- to see how it's calculated. It's pretty standard. With regards to "su varname, d", see -help summarize- and the manual for -summarize-

On 24/10/2010, at 3:27 PM, Grace Jessie wrote:

> I want to generate a new variable equaling 1 if the other variable is greater than its 100/3 percentile and 0 otherwise.How to get the 100/3th percentile of a variable?
> And how does Stata calculate percentiles if the number of observations is odd or even?
> Additionally, what does the output "smallest and largest" mean after "su varname,d"?

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index