Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: display identifiers accounting for duplicate obs |

Date |
Mon, 7 May 2012 20:37:20 +0100 |

Not so. As already said -egen-'s -rank()- function has options which give other definitions of rank. In any case, Tashi, what you are saying here is that you don't want -rank()-'s default. Until you say what you do want -- which was re it is difficult to know how to advise you, except to look again at the help for -egen-, look at what that function can do and say exactly what you want instead if it is something different (which I doubt). Nick On Mon, May 7, 2012 at 8:28 PM, tashi lama <ltashi32@hotmail.com> wrote: > I 've looked at rank before posting this question. The problem is this doesn't accomplish what I am trying to get. Rank generates an average for the ties, which is at best very confusing although it will make the reader look the dataset twice to figure out sth diff... which is "ties" here. The solution by Stas Kolenikov works but can only rank from lowest to highest , lowest getting rank 1. Nick Cox >> . search rank >> >> would have pointed to -egen- (and much else besides). >> >> Apart from the question of how to calculate ranks in Stata, Tashi left >> often the question of how ranks are defined in any case when >> duplicates (meaning, ties) are present. The default in Stata uses a >> rule that will be familiar to students of rank correlation: ties are >> ranked equally and the average rank is preserved. I don't know that >> this rule is used anywhere outside statistics. >> >> -egen, rank()- has various options in addition to that default. When >> Richard Goldstein and I were writing what were then extensions to >> -egen- >> >> STB-52 dm72.1 . . . . . . . . . . . . Alternative ranking procedures: update >> (help altrank if installed) . . . . . . . . N. J. Cox and R. Goldstein >> 11/99 p.2; STB Reprints Vol 9, p.51 >> incorporated into Stata 7.0 egen rank() function >> >> STB-51 dm72 . . . . . . . . . . . . . . . . . Alternative ranking procedures >> (help altrank, lbleqrnk if installed) . . . N. J. Cox and R. Goldstein >> 9/99 pp.5--7; STB Reprints Vol 9, pp.48--51 >> incorporated into Stata 7.0 egen rank() function >> >> I spent some time looking for systematic treatments of different >> ranking rules in various literatures and was surprised to find >> nothing, so the names "field", "track" and "unique" were introduced >> faute de mieux. I am still interested in relevant literature >> references. >> >> http://press.princeton.edu/titles/9661.html >> >> looks interesting, but I have yet to read it. >> >> Nick >> >> On Fri, May 4, 2012 at 9:33 PM, Ronnie Babigumira <rb.glists@gmail.com> wrote: >> > sorry that should have been >> > >> > egen rhits = rank(-hits) >> >> On Friday, May 4, 2012 at 10:32 PM, Ronnie Babigumira wrote: >> >> >> egen rhits = rank(hits)? >> >> On Friday, May 4, 2012 at 10:27 PM, tashi lama wrote: >> > >> >> > I can't come up with this solution despite spending quite some thought and time. The problem in hand sounds fairly straigh forward >> >> > >> >> > I have a dataset like following >> >> > >> >> > hits >> >> > >> >> > 1 >> >> > 2 >> >> > 3 >> >> > 4 >> >> > 4 >> >> > 5 >> >> > 6 >> >> > 6 >> >> > >> >> > and I want to generate variable rank. Notice, if there were no duplicate obs, i would have said >> >> > >> >> > >> >> > gsort -hits >> >> > >> >> > gen rank=_n and rank column would have given the ranks of the obs. That is what i want. >> >> > >> >> > >> >> > However, there are some duplicate obs and i tried doing >> >> > >> >> > gsort -hits >> >> > >> >> > gen rank=cond(hits[_n-1]==hits[_n], _n-1, _n) which would give me >> >> > >> >> > >> >> > hits rank >> >> > >> >> > 6 1 >> >> > >> >> > 6 1 >> >> > >> >> > 5 3 >> >> > >> >> > 4 4 >> >> > >> >> > 4 4 >> >> > >> >> > 3 6 >> >> > >> >> > 2 7 >> >> > >> >> > 1 8 and that is not what I want. >> >> > >> >> > >> >> > >> >> > I looked at commands like generate, duplicates and I didn't see much relevant to my problem. >> >> > >> >> > >> >> > >> >> > Could someone give me a lead where to look at or which command should I dig in ? Thanks a lot. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: display identifiers accounting for duplicate obs***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: display identifiers accounting for duplicate obs***From:*tashi lama <ltashi32@hotmail.com>

**Re: st: display identifiers accounting for duplicate obs***From:*Ronnie Babigumira <rb.glists@gmail.com>

**Re: st: display identifiers accounting for duplicate obs***From:*Ronnie Babigumira <rb.glists@gmail.com>

**Re: st: display identifiers accounting for duplicate obs***From:*Nick Cox <njcoxstata@gmail.com>

**RE: st: display identifiers accounting for duplicate obs***From:*tashi lama <ltashi32@hotmail.com>

- Prev by Date:
**RE: st: display identifiers accounting for duplicate obs** - Next by Date:
**Re: st: display identifiers accounting for duplicate obs** - Previous by thread:
**RE: st: display identifiers accounting for duplicate obs** - Next by thread:
**Re: st: display identifiers accounting for duplicate obs** - Index(es):