st: RE: Sorting/ranking Q from new user

 From "Nick Cox" To Subject st: RE: Sorting/ranking Q from new user Date Fri, 1 Aug 2003 10:13:07 +0100

```Eric VonDohlen

> I have a continuous variable X, which I would like to:
>
> (a) sort in ascending or descending order;
> (b) rank the sorted X into some specified number of groups;
> (c) report the mean of X (or some other statistic) by group.

Jayesh Kumar replied and pointed to -gsort- for (a). Fine.

On (b) and (c) Jayesh suggested

> *This will create percentiles, you can choose your own number of
groups.
> *for ranking purpose:
> by year:gen a=_n
> bysort year: egen b=max(a)
> gen percentile_year=((a/b)*100)

> *for reporting summary statistics:
> bysort percentile_year: summarize year

This is an interesting approach, but it needs to be
followed by some fixes and a couple of warnings. I don't
think it is general enough to be the best answer
to Eric's question.

A small fix is that the first command depends on observations
being in the right -sort- order, so the -bysort- is
needed on that (and not needed on the second):

bysort year: gen a = _n
by year: egen b = max(a)
gen percentile_year = ((a/b)*100)

As a matter of Stata style only, this can be condensed to

bysort year : gen percentile_year = (_n/_N) * 100

The first major problem is that whatever is of interest
should be sorted within each -year- (if not, the
assignment of percentiles is quite arbitrary).

bysort year (whatever) : gen percentile_year = (_n/_N) * 100

Two other major problems:

* No adjustment for ties. Tied values will get
assigned to different percentiles.

* This works best when there is an equal number
of observations within each group, but not otherwise.
(Suppose there were 4 observations in each -year-.
-percentile_year- would take on values 25, 50, 75, 100.)

A more general answer to Eric question's is to use
-xtile- and then -summarize-, -tabstat-, etc. (and
manual like everybody else!).

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```