Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Sorting/ranking Q from new user


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Sorting/ranking Q from new user
Date   Fri, 1 Aug 2003 10:13:07 +0100

Eric VonDohlen

> I have a continuous variable X, which I would like to:
>
> (a) sort in ascending or descending order;
> (b) rank the sorted X into some specified number of groups;
> (c) report the mean of X (or some other statistic) by group.

Jayesh Kumar replied and pointed to -gsort- for (a). Fine.

On (b) and (c) Jayesh suggested

> *This will create percentiles, you can choose your own number of
groups.
> *for ranking purpose:
> by year:gen a=_n
> bysort year: egen b=max(a)
> gen percentile_year=((a/b)*100)

> *for reporting summary statistics:
> bysort percentile_year: summarize year

This is an interesting approach, but it needs to be
followed by some fixes and a couple of warnings. I don't
think it is general enough to be the best answer
to Eric's question.

A small fix is that the first command depends on observations
being in the right -sort- order, so the -bysort- is
needed on that (and not needed on the second):

bysort year: gen a = _n
by year: egen b = max(a)
gen percentile_year = ((a/b)*100)

As a matter of Stata style only, this can be condensed to

bysort year : gen percentile_year = (_n/_N) * 100

The first major problem is that whatever is of interest
should be sorted within each -year- (if not, the
assignment of percentiles is quite arbitrary).

bysort year (whatever) : gen percentile_year = (_n/_N) * 100

Two other major problems:

* No adjustment for ties. Tied values will get
assigned to different percentiles.

* This works best when there is an equal number
of observations within each group, but not otherwise.
(Suppose there were 4 observations in each -year-.
-percentile_year- would take on values 25, 50, 75, 100.)

A more general answer to Eric question's is to use
-xtile- and then -summarize-, -tabstat-, etc. (and
to read the manual; new users are expected to read the
manual like everybody else!).

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index