Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: How does Stata calculate percentiles?

From   Nick Cox <>
To   "''" <>
Subject   RE: st: How does Stata calculate percentiles?
Date   Sun, 24 Oct 2010 17:42:23 +0100

Phil gave a good answer. However, once you have the -centile- result in memory, 

local cutpoint=r(c_1)
recode price (min/`cutpoint'=0) (`cutpoint'/max=1), gen(pricecat)

can be replaced by one line 

gen pricecat = (price >= r(c_1)) if !missing(price)

which yields 0, 1 and numeric missing as appropriate.

On the other hand, why do you want to throw away information like this? 


Phil Clayton

One way to do it would be to obtain the centile using the -centile- command, then -recode- the variable to create the indicator variable.

sysuse auto
centile price, centile(33)
local cutpoint=r(c_1)
recode price (min/`cutpoint'=0) (`cutpoint'/max=1), gen(pricecat)

If you wanted the indicator variable to be 1 if the variable is >= the cutpoint (as opposed to >), swap the two recoding rules (once one rule is matched, the subsequent rules are ignored).

See the manual for -centile- to see how it's calculated. It's pretty standard. With regards to "su varname, d", see -help summarize- and the manual for -summarize-

On 24/10/2010, at 3:27 PM, Grace Jessie wrote:

> I want to generate a new variable equaling 1 if the other variable is greater than its 100/3 percentile and 0 otherwise.How to get the 100/3th percentile of a variable?
> And how does Stata calculate percentiles if the number of observations is odd or even?
> Additionally, what does the output "smallest and largest" mean after "su varname,d"?

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index