Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: rank error?


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: rank error?
Date   Wed, 28 May 2003 11:25:44 +0100

Jeph Herrin

> According to the documentation, egen's -rank- function should,
> with -field- or -track- switches, give me consecutive ranks. Yet
> when I try:
>
> . egen rank=rank(var1), track
>
> I get non-consecutive ranks:
>
> . tab rank
>
>   track rank |
>   of (beta1) |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>            1 |          5        0.83        0.83
>            6 |          5        0.83        1.66
>           11 |          6        1.00        2.66
>           17 |          8        1.33        3.99
>           25 |          8        1.33        5.32
>           33 |          4        0.67        5.99
>           .
>           .
>           .
>
> I've been looking at this for half an hour and finally figured
> either there's something wrong with egen's rank, or there's
> something
> wrong with me.

Melony E. S. Sorbero

> Based on your table, it looks like ties are included in determining
the next
> value in the ranking. You have 5 observations tied with a rank of 1,
so the
> next ranking that appears is 6, and so on.

I think Melony is correct.

The documentation is terse, but I don't think either it or the code is
in error.

Let's consider ranking values 1, 2, 2, 2, 3.

1. The default of -egen, rank()- is to say

Value 1 has rank 1 (statistical convention: lowest value has lowest
rank).

Values 2, 2, 2 must have the same rank, but it should be assigned
preserving
the sum of the ranks which would otherwise have been allocated, i.e.
(2 + 3 + 4)
implies a rank of 3. This "correction for ties" is used in various
nonparametric
procedures, such as Spearman rank correlation.

Value 3 has rank 5.

2. The option -egen, rank() track- modifies this in how ties are
treated:

Value 1 has rank 1 (rule in track events: lowest value (i.e. lowest
time) has lowest rank).

Values 2, 2, 2 must have the same rank, but it should be assigned
according
to how many observations have lower values. (Analogue: in sports that
I
know of, not many and not well, these would all be second "equal".
Of course, many sports have procedures for breaking ties
and/or sufficiently precise timing or scoring that ties don't arise,
but that doesn't
affect the principle.)

Value 3 has rank 5.

The terminology of -track- and -field- was introduced (in STB-51 in
1999)
because the authors were not aware of standard alternatives. Is it
misleading?

I think what Jeph may be looking for is the -unique- option.

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index