Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: AW: AW: AW: Error in egen rank(), unique?


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: AW: AW: AW: Error in egen rank(), unique?
Date   Tue, 3 Nov 2009 16:08:20 -0000

Various points have arisen in this thread, so let's try to tease them
apart. 

1. Marc's -replace- had an unintended effect of overwriting some values
that were fine for his purposes in a variable he created earlier with
-egen, rank() unique-. In essence, getting the changes he wanted
depended delicately on getting the data in exactly the right -sort-
order, as he later realised. Thus there is, emphatically, no error in
-egen- here; Marc merely later stomped by accident on some of its
results and pointed the finger of blame in the wrong direction, as no
doubt we all do occasionally. 

2. Marc maintains that -egen, rank() unique- does not do what he expects
when there are missings. At least, that is what I think he means; he
doesn't quite spell out the argument. This is a little arguable, but my
own suggestion is that the code is on balance fine as it is. 

2.1 -egen, rank()- ignores missings and I suggest that that is exactly
the right behaviour, at least as a default. If a value is missing, so
also should its rank be. 

2.2 I can see a very small case for an option -missing- that says, "I
want to rank the missings too". That would involve taking Stata's
conventions about . .a .b ... .z very literally. I am counting votes
here tacitly because I can't remember a suggestion like Marc's before.
Clearly, Mark wants this for data management; I can't see it being
wanted statistically. 

3. However, for what Marc wants there is an even easier solution 

-bys id (dbep): gen rank_dbep = _n 

-- which is shorter and more efficient. With lots of values, consider
spelling out -long- or -double- as variable type. With this easy
work-around the case for Marc's suggestion looks very small indeed to
me, but the code in question has long since been officially adopted, and
it's StataCorp's call. 

4. By the way, using -duplicates- immediately after the -egen- in
question would have been one way to check whether it was at fault. 

Nick 
n.j.cox@durham.ac.uk 

Martin Weiss

Honestly, I cannot see the point of the problem, but note that your

*************
sort id dbep
by id, sort: egen rank_dbep = rank(dbep), unique
*************

can be telescoped into

-bys id (dbep): egen rank_dbep = rank(dbep), unique-

Kaulisch, Marc

Thanks for your answer. Your example works just fine. As it does with a
different dataset. The deletion of the qualifier [_n] does not change
the
behaviour.

I found a work around:
The problem was that the var dbep (being formated as %tm) was not
ordered.
Dbep1 dbep2 dbep3 dbep4 dbep5
2007m8 2008m5 . 2009m3 .

This sequence was ranked:
1 2 3 3

After inserting
. sort id dbep
Before the 
. by id, sort: egen rank_dbep = rank(dbep), unique
It just works fine

Nonetheless I consider this behaviour as not congruent with the
description
of the
Egen rank(), unique function saying "The unique option calculates the
unique
rank of exp: values are ranked 1,...,#, and values and ties are broken
arbitrarily.  Two values that are tied for second are ranked 2 and 3. "
(. h egen)

Martin Weiss

No problem occur in this code, so where is the material difference to
yours?

*************
clear*
set obs 10

gen id=_n
gen dbep1=5+int(10*runiform())
gen dbep2=5+int(10*runiform())
gen deep1=5+int(10*runiform())
gen deep2=5+int(10*runiform())
gen durep1=rnormal()
gen durep2=rnormal()

reshape long dbep deep durep, i(id) j(episode) by id, sort: egen
rank_dbep =
rank(dbep), unique by id, sort: egen rank_deep = rank(deep), unique by
id,
sort: replace rank_dbep =_n if rank_dbep[_n] == .
sort id rank_dbep
drop episode
reshape wide dbep deep durep rank_deep, i(id) j(rank_dbep)
*************

You do change the values returned by -egen, rank()- with your -replace-
line, so it is hard to argue that -egen- is at fault. Still, the line
clearly intends to replace the rank by its running number if it is
missing.
So insert something like -inspect rank_dbep- before that line to see
whether
there are any missings in the first place.

Also note that the -if rank_dbep[_n] == .- qualifier could easily be -
if
rank_dbep == .-...

Kaulisch, Marc

I have a problem with an egen rank(), unique command (Stata version
10.1).
It looks like it does not produce the unique values as I like.
 
This is my start to rank episodes by their beginning (dbep). First I
reshape
the dataset into the long-format.
 
. reshape long dbep deep durep, i(id) j(episode)
 
. by id, sort: egen rank_dbep = rank(dbep), unique . by id, sort: egen
rank_deep = rank(deep), unique
 
. by id, sort: replace rank_dbep = _n if rank_dbep[_n] == .
 
. sort id rank_dbep
 
When I do want to transform this dataset back in the wide format:
 
. drop episode
 
. reshape wide dbep deep durep rank_deep , i(id) j(rank_dbep)
 
I receive the following error:
 
rank_dbep not unique within id;
there are multiple observations at the same rank_dbep within id.
Type "reshape error" for a listing of the problem observations.
r(9);
 
reshape error gives out the number of 15 cases in which rank_dbep is not
unique.
 
Strangely enough with a different dataset the same commands work just
fine.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index