[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Newson, Roger B" <r.newson@imperial.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: How to create a rank? |

Date |
Tue, 4 Mar 2008 20:30:27 -0000 |

As it happens, at least one statistical reason for using the default ranks of -rank()- is that these are the ranks used in the Wilcoxon ranksum test (see on-line help for -ranksum-). These default ranks can be used to define Somers' D(Y|X), where Y is the variable being ranked and X is a binary variable. Somers' D is the parameter behind the so-called "non-parametric" ranksum test. If X has values 0 and 1, then Somers' D is the difference between 2 probabilities, namely the probability that a random Y-value from the sub-population where X==1 is larger than a random Y-value from the subpopulation where X==0 and the probability that a random Y-value from the subpopulation where X==0 is larger than a random Y-value from the subpopulation where X==1. If the sample number is N, and the numbers of X-values equal to 0 and 1 are N_0 and N_1, respectively, and Y_ij is the j'th Y-value in the subsampl;e where X==i for i in {0,1} and j from 1 to N_i, then we have the equality D(Y|X) = (2/N) * ( (Sum_j=1^N_1 Rank(Y_1j)/N_1) - (Sum_j=1^N_0 Rank(Y_0j)/N_0) ) or, in other words, the difference between the 2 subsample mean ranks multiplied by 2/N. Under these conditions, Somers' D is also known as the rank-biserial correlation (RBC) coefficient of Cureton (1956). I would like to thank Mike Lacy of Colorado State University for drawing my attention to Edward Cureton's work on Somers' D for binary X, which pre-dates the more general definition of Somers' D for possibly non-binary X (Somers, 1962). Roger References Cureton EE. Rank-biserial correlation. Psychometrika 1956; 21(3): 287-290. Somers RH. A new asymmetric measure of association for ordinal variables. American Sociological Review 1962; 27: 799-811. Roger B Newson Lecturer in Medical Statistics Respiratory Epidemiology and Public Health Group National Heart and Lung Institute Imperial College London Royal Brompton campus Room 33, Emmanuel Kaye Building 1B Manresa Road London SW3 6LR UNITED KINGDOM Tel: +44 (0)20 7352 8121 ext 3381 Fax: +44 (0)20 7351 8322 Email: r.newson@imperial.ac.uk Web page: www.imperial.ac.uk/nhli/r.newson/ Departmental Web page: http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/pop genetics/reph/ Opinions expressed are those of the author, not of the institution. -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: 04 March 2008 18:48 To: statalist@hsphsun2.harvard.edu Subject: RE: st: How to create a rank? Roger makes the main point, that -egen, rank()- supports other definitions of ranks that always yield integers. That is explained, although tersely, in the on-line help. For most statistical purposes, ranks for n values are defined by a choice of whether highest or lowest has rank 1 and two rules: 1. Equal values get equal ranks. 2. The sum of the ranks is always equal to the sum of the integers 1 to n. The extra options -field- -track- and -unique- break either 1 or 2 and correspond to occasional requests for something different. I don't know of a statistical case for any of them: the rationale is more likely to be for data reporting or graphics. The options have their origins in user-written functions written by Richard Goldstein and myself. The original account is accessible in http://www.stata.com/products/stb/journals/stb51.pdf and is more leisurely than the on-line help. Understand that the syntax details in STB-51 do not correspond to current Stata syntax. Nick n.j.cox@durham.ac.uk Newson, Roger B If you check the entry for -rank()- under -whelp egen-, then you will find that -rank()- supports multiple definitions of ranks. The default is rank(Y_j) = 0.5 + 0.5*Sum_k(Y_k==Y_j) + Sum_k(Y_k<Y_j) where Y_j is the value of the variable being ranked in the j'th observation, rank(Y_j) is its rank, Y_k is the k'th observation, and Sum_k is the sum over all k from 1 to N, where N is the number of observations being ranked. This definition implies that tied Y_j values are given the mean of the ranks that they would have had, if they had been ranked randomly. This often implies fractional ranks. However, there are other possibilities on offer. You can decide which one is right for your purposes. Song Thank you. I solved my problem. By the way, if there exist ties, egen rank = rank(revenue) also creates a problem. In my case, the command produced ***.5, etc. instead of a whole number. Nick Cox > For ranks, use -egen, rank()-. It's as simple as that. Song >> I am trying to create 'rank' based on total revenues. I used the > following >> command: >> >> sort revenue >> egen rank=group(revenue) >> >> The result is that the smallest revenue is 1 and the highest revenue > is 100, >> for example. How can I reverse the rank? I want the highest revenue > to be >> number '1'. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: How to create a rank?***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**References**:**RE: st: -reshape- with more than one j()?***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**st: How to create a rank?***From:*"Song" <raravise@gmail.com>

**Re: st: How to create a rank?***From:*"Jeremy Miles" <jeremy.miles@gmail.com>

**RE: st: How to create a rank?***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**Re: st: How to create a rank?***From:*"Song" <raravise@gmail.com>

**RE: st: How to create a rank?***From:*"Newson, Roger B" <r.newson@imperial.ac.uk>

- Prev by Date:
**RE: st: How to create a rank?** - Next by Date:
**st: xmelogit error message** - Previous by thread:
**RE: st: How to create a rank?** - Next by thread:
**RE: st: How to create a rank?** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |