Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: Percentiles: Identify of smallest observation closet to percentile value


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: Percentiles: Identify of smallest observation closet to percentile value
Date   Mon, 3 Feb 2003 20:08:28 -0000

Rod Hunter

> I still don't appear to be out of the woods.
>
> 1.
> I applied Nick's procedure to my example:
> . centile var1 , c(75)
> --------------------------------------------------------------------
> 				-- Binom. Interp. --
> 	Variable	Obs  Percentile	Centile	[95% Conf. Interval]
>
> 	var1	4         75	9.75	7.45          10*
>
> *	Lower (upper)	confidence limit held	at minimum
> (maximum) of sample
> -----------------------------------------------------------
> . ret li
> ------------------------------------------------
> scalars:
> 	r(n_cent)	=	1
> 	r(N)	=	4
> 	r(ub_1)	=	10
> 	r(lb_1)	=	7.450000000008672
> 	r(c_1)	=	9.75
> macros:
> 	r(centiles)	:	"75"
> -------------------------------------------------
> . list var1 if var1 == `r(c_1)'
> -----------------------------------------------
> 	var1
> .
> -----------------------------------------------
> This does not give me the identity nor value of the largest
> observation
> within the percentage of the observations included by the
> 75th percentile.
>
> 2.
> I tried Ronnie's approach:
>
> . sum var1, d
> -------------------------------------------
> 		var1
> 	Percentiles	Smallest
> 1%	7	7
> 5%	7	8
> 10%	7	9	Obs	4
> 25%	7.5	10	Sum of Wgt.	4
>
> 50%	8.5		Mean	8.5
> 		Largest	Std. Dev.	1.290994
> 75%	9.5	7
> 90%	10	8	Variance	1.666667
> 95%	10	9	Skewness	0
> 99%	10	10	Kurtosis	1.64
> ---------------------------------------------------
> . local macro*
> . list var1 if var1 == `p75'
> -------------------------------------------------
> 	var1
> -------------------------------------------------
>
> 3.
> This is what I have been using:
>
> . _pctile  var1,p(75)
> . gen var1_p75=r(r1)
> . tabstat var1 if var1==var1_p75, stat(max)
> -------------------------------------
> no observations
> -------------------------------------
> alternatively:
> . list var1 if var1==var1_p75
> ---------------------------------------
>   	var1
> ---------------------------------------


Thinking about this further, there are two possible issues here:

1. if a percentile is ever calculated by interpolation
between neighbouring order statistics, then no value is ever equal to
it.

2. the precision problem whereby binary <-> decimal
operations need care.

Try this. It seems a bit complicated for the problem;
someone may have a simpler solution.

program def nearpctile, rclass
	version 7
	syntax varname [if] [in], Percentiles(numlist)
	marksample touse
	_pctile `varlist' if `touse', p(`percentiles')
	tempvar id diff
	tempname whichobs
	gen long `id' = _n
	qui {
		gen double `diff' = /*
		*/ abs(`varlist' - r(r1)) if `touse'
		su `diff', meanonly
		tab `id' if `diff' == r(min), matrow(`whichobs')
		local n = r(r)
            forval i = 1 / `n' {
	            local ob = `whichobs'[`i',1]
			local obs "`obs'`ob' "
            }
	}
	di as txt "`obs'"
	return local obs "`obs'"
end

Example:

. _pctile mpg, p(75)

. ret li

scalars:
                r(r1) =  25

. l mpg if mpg == 25

     +-----+
     | mpg |
     |-----|
 44. |  25 |
 55. |  25 |
 61. |  25 |
 72. |  25 |
 73. |  25 |
     +-----+

. nearpctile mpg, p(75)
44 55 61 72 73

. ret li

macros:
               r(obs) : "44 55 61 72 73 "

This also seems to work in messier problems,
e.g. 75th percentile of gear_ratio.




Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index