Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Quantile question


From   "Dedman, Dan" <D.Dedman@ljmu.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Quantile question
Date   Fri, 29 Feb 2008 15:40:35 -0000

We want to agree on a method for producing quantiles so we are all
working to the same algorithm. I was intrigued by the way Stata does it
and wondered where this came from and what the justification is.

If we have 150 observations to be grouped into quintiles - this is easy.
But what if we had 151, or 152, 153 or 154 observations?

This is how Stata 9 does it using -xtile- :

xtile newvar=rank, nquantiles(5)
	
----------------------------
q1	31	31	31	31
q2	30	30	31	31
q3	30	31	30	31
q4	30	30	31	31
q5	30	30	30	30
----------------------------
All	151	152	153	154


and using the -cut- function from -egen- :

egen q2=cut(rank), group(5)

----------------------------
q0	30	30	30	30
q1	30	30	31	31
q2	30	31	30	31
q3	30	30	31	31
q4	31	31	31	31
----------------------------
All	151	152	153	154

So the two methods work in opposite directions, but are otherwise
consistent in where they place the 'extra' 1 to 4 observations. 

I am quite to adopt the Stata approach, but some of my colleagues do not
use Stata, so I would like to describe how the Stata algorithm works,
and why Stata does it this this way as opposed to any other way. Is this
a general convention, or more easy to justify statistically or
otherwise, or just a case of find a way that works and stick with it.

Many thanks


Daniel Dedman
Public Health Information Analyst/Project Manager
North West Public Health Observatory

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index