# RE: st: RE: Percentiles: Identify of smallest observation closet to percentile value

 From "Nick Cox" To Subject RE: st: RE: Percentiles: Identify of smallest observation closet to percentile value Date Mon, 3 Feb 2003 20:08:28 -0000

```Rod Hunter

> I still don't appear to be out of the woods.
>
> 1.
> I applied Nick's procedure to my example:
> . centile var1 , c(75)
> --------------------------------------------------------------------
> 				-- Binom. Interp. --
> 	Variable	Obs  Percentile	Centile	[95% Conf. Interval]
>
> 	var1	4         75	9.75	7.45          10*
>
> *	Lower (upper)	confidence limit held	at minimum
> (maximum) of sample
> -----------------------------------------------------------
> . ret li
> ------------------------------------------------
> scalars:
> 	r(n_cent)	=	1
> 	r(N)	=	4
> 	r(ub_1)	=	10
> 	r(lb_1)	=	7.450000000008672
> 	r(c_1)	=	9.75
> macros:
> 	r(centiles)	:	"75"
> -------------------------------------------------
> . list var1 if var1 == `r(c_1)'
> -----------------------------------------------
> 	var1
> .
> -----------------------------------------------
> This does not give me the identity nor value of the largest
> observation
> within the percentage of the observations included by the
> 75th percentile.
>
> 2.
> I tried Ronnie's approach:
>
> . sum var1, d
> -------------------------------------------
> 		var1
> 	Percentiles	Smallest
> 1%	7	7
> 5%	7	8
> 10%	7	9	Obs	4
> 25%	7.5	10	Sum of Wgt.	4
>
> 50%	8.5		Mean	8.5
> 		Largest	Std. Dev.	1.290994
> 75%	9.5	7
> 90%	10	8	Variance	1.666667
> 95%	10	9	Skewness	0
> 99%	10	10	Kurtosis	1.64
> ---------------------------------------------------
> . local macro*
> . list var1 if var1 == `p75'
> -------------------------------------------------
> 	var1
> -------------------------------------------------
>
> 3.
> This is what I have been using:
>
> . _pctile  var1,p(75)
> . gen var1_p75=r(r1)
> . tabstat var1 if var1==var1_p75, stat(max)
> -------------------------------------
> no observations
> -------------------------------------
> alternatively:
> . list var1 if var1==var1_p75
> ---------------------------------------
>   	var1
> ---------------------------------------

1. if a percentile is ever calculated by interpolation
between neighbouring order statistics, then no value is ever equal to
it.

2. the precision problem whereby binary <-> decimal
operations need care.

Try this. It seems a bit complicated for the problem;
someone may have a simpler solution.

program def nearpctile, rclass
version 7
syntax varname [if] [in], Percentiles(numlist)
marksample touse
_pctile `varlist' if `touse', p(`percentiles')
tempvar id diff
tempname whichobs
gen long `id' = _n
qui {
gen double `diff' = /*
*/ abs(`varlist' - r(r1)) if `touse'
su `diff', meanonly
tab `id' if `diff' == r(min), matrow(`whichobs')
local n = r(r)
forval i = 1 / `n' {
local ob = `whichobs'[`i',1]
local obs "`obs'`ob' "
}
}
di as txt "`obs'"
return local obs "`obs'"
end

Example:

. _pctile mpg, p(75)

. ret li

scalars:
r(r1) =  25

. l mpg if mpg == 25

+-----+
| mpg |
|-----|
44. |  25 |
55. |  25 |
61. |  25 |
72. |  25 |
73. |  25 |
+-----+

. nearpctile mpg, p(75)
44 55 61 72 73

. ret li

macros:
r(obs) : "44 55 61 72 73 "

This also seems to work in messier problems,
e.g. 75th percentile of gear_ratio.

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```