Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: How does Stata calculate percentiles?

From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   RE: st: How does Stata calculate percentiles?
Date   Sun, 24 Oct 2010 17:42:23 +0100

Phil gave a good answer. However, once you have the -centile- result in memory, 

local cutpoint=r(c_1)
recode price (min/`cutpoint'=0) (`cutpoint'/max=1), gen(pricecat)

can be replaced by one line 

gen pricecat = (price >= r(c_1)) if !missing(price)

which yields 0, 1 and numeric missing as appropriate.

On the other hand, why do you want to throw away information like this? 

[email protected] 

Phil Clayton

One way to do it would be to obtain the centile using the -centile- command, then -recode- the variable to create the indicator variable.

sysuse auto
centile price, centile(33)
local cutpoint=r(c_1)
recode price (min/`cutpoint'=0) (`cutpoint'/max=1), gen(pricecat)

If you wanted the indicator variable to be 1 if the variable is >= the cutpoint (as opposed to >), swap the two recoding rules (once one rule is matched, the subsequent rules are ignored).

See the manual for -centile- to see how it's calculated. It's pretty standard. With regards to "su varname, d", see -help summarize- and the manual for -summarize-

On 24/10/2010, at 3:27 PM, Grace Jessie wrote:

> I want to generate a new variable equaling 1 if the other variable is greater than its 100/3 percentile and 0 otherwise.How to get the 100/3th percentile of a variable?
> And how does Stata calculate percentiles if the number of observations is odd or even?
> Additionally, what does the output "smallest and largest" mean after "su varname,d"?

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index