Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: display identifiers accounting for duplicate obs


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: display identifiers accounting for duplicate obs
Date   Mon, 7 May 2012 20:38:18 +0100

Slightly premature send.

On Mon, May 7, 2012 at 8:37 PM, Nick Cox <[email protected]> wrote:

> Not so. As already said -egen-'s -rank()- function has options which
> give other definitions of rank.
>
> In any case, Tashi, what you are saying here is that you don't want
> -rank()-'s default. Until you say what you do want -- which was requested before -- it
> is difficult to know how to advise you, except to look again at the
> help for -egen-, look at what that function can do and say exactly
> what you want instead if it is something different (which I doubt).
>
> Nick
>
> On Mon, May 7, 2012 at 8:28 PM, tashi lama <[email protected]> wrote:
>
>> I 've looked at rank before posting this question. The  problem is this doesn't accomplish what I am trying to get. Rank generates an average for the ties, which is at best very confusing although it will make the reader look the dataset twice to figure out sth diff... which is "ties" here. The solution by Stas Kolenikov works but can only rank from lowest to highest , lowest getting rank 1.
>
> Nick Cox
>
>>> . search rank
>>>
>>> would have pointed to -egen- (and much else besides).
>>>
>>> Apart from the question of how to calculate ranks in Stata, Tashi left
>>> often the question of how ranks are defined in any case when
>>> duplicates (meaning, ties) are present. The default in Stata uses a
>>> rule that will be familiar to students of rank correlation: ties are
>>> ranked equally and the average rank is preserved. I don't know that
>>> this rule is used anywhere outside statistics.
>>>
>>> -egen, rank()- has various options in addition to that default. When
>>> Richard Goldstein and I were writing what were then extensions to
>>> -egen-
>>>
>>> STB-52 dm72.1 . . . . . . . . . . . . Alternative ranking procedures: update
>>> (help altrank if installed) . . . . . . . . N. J. Cox and R. Goldstein
>>> 11/99 p.2; STB Reprints Vol 9, p.51
>>> incorporated into Stata 7.0 egen rank() function
>>>
>>> STB-51 dm72 . . . . . . . . . . . . . . . . . Alternative ranking procedures
>>> (help altrank, lbleqrnk if installed) . . . N. J. Cox and R. Goldstein
>>> 9/99 pp.5--7; STB Reprints Vol 9, pp.48--51
>>> incorporated into Stata 7.0 egen rank() function
>>>
>>> I spent some time looking for systematic treatments of different
>>> ranking rules in various literatures and was surprised to find
>>> nothing, so the names "field", "track" and "unique" were introduced
>>> faute de mieux. I am still interested in relevant literature
>>> references.
>>>
>>> http://press.princeton.edu/titles/9661.html
>>>
>>> looks interesting, but I have yet to read it.
>>>
>>> Nick
>>>
>>> On Fri, May 4, 2012 at 9:33 PM, Ronnie Babigumira <[email protected]> wrote:
>>> > sorry that should have been
>>> >
>>> > egen rhits = rank(-hits)
>>>
>>> On Friday, May 4, 2012 at 10:32 PM, Ronnie Babigumira wrote:
>>>
>>> >> egen rhits = rank(hits)?
>>>
>>> On Friday, May 4, 2012 at 10:27 PM, tashi lama wrote:
>>> >
>>> >> > I can't come up with this solution despite spending quite some thought and time. The problem in hand sounds fairly straigh forward
>>> >> >
>>> >> > I have a dataset like following
>>> >> >
>>> >> > hits
>>> >> >
>>> >> > 1
>>> >> > 2
>>> >> > 3
>>> >> > 4
>>> >> > 4
>>> >> > 5
>>> >> > 6
>>> >> > 6
>>> >> >
>>> >> > and I want to generate variable rank. Notice, if there were no duplicate obs, i would have said
>>> >> >
>>> >> >
>>> >> > gsort -hits
>>> >> >
>>> >> > gen rank=_n and rank column would have given the ranks of the obs. That is what i want.
>>> >> >
>>> >> >
>>> >> > However, there are some duplicate obs and i tried doing
>>> >> >
>>> >> > gsort -hits
>>> >> >
>>> >> > gen rank=cond(hits[_n-1]==hits[_n], _n-1, _n) which would give me
>>> >> >
>>> >> >
>>> >> > hits rank
>>> >> >
>>> >> > 6 1
>>> >> >
>>> >> > 6 1
>>> >> >
>>> >> > 5 3
>>> >> >
>>> >> > 4 4
>>> >> >
>>> >> > 4 4
>>> >> >
>>> >> > 3 6
>>> >> >
>>> >> > 2 7
>>> >> >
>>> >> > 1 8 and that is not what I want.
>>> >> >
>>> >> >
>>> >> >
>>> >> > I looked at commands like generate, duplicates and I didn't see much relevant to my problem.
>>> >> >
>>> >> >
>>> >> >
>>> >> > Could someone give me a lead where to look at or which command should I dig in ? Thanks a lot.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index