Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | J Taylor <jwtayl@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: RE: AW: RE: AW: AW: Creating a Group Pair ID (where the generating variables order shouldn't matter) |
Date | Sat, 24 Jul 2010 02:14:17 -0700 |
Thank you Nick and Martin, this is just what I was looking for. JT On Fri, Jul 23, 2010 at 5:30 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > Good question. > > Your identifier would lead to an integer variable with labels if -group()- were used with the -label- option. A good identifier should be informative as well as distinct, so I regard using -label- as very good practice. I didn't spot that you weren't following that very good practice. My mistake. > > Nick > n.j.cox@durham.ac.uk > > Martin Weiss > > " Which could in turn be made simpler:" > > > Though the two approaches hardly lead to the same result. My notion of an > "ID", as originally requested, would not be a string such as "England > France", but a numeric variable running from 1 to N, with N the number of > distinct groups. > > ************* > clear* > inp str20 c1id str20 c2id > "US" "Canada" > "US" "Mexico" > "Canada" "US" > "US" "France" > "France" "England" > "France" "US" > end > > gen newid = cond(c1id < c2id, c1id, c2id) /* > */ + " " + cond(c2id < c1id, c1id, c2id) > > sort newid > > l, sepby(newid) noo > ************* > > > What makes you think that my approach returns a " ...integer variable with > labels."? All I can find is a -varlabel- attached to my newid. > > > > > . d newid > > storage display value > variable name type format label variable label > ---------------------------------------------------------------------------- > --------------- > newid float %9.0g group(first second) > > > > > HTH > Martin > > > -----Ursprüngliche Nachricht----- > Von: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Nick Cox > Gesendet: Freitag, 23. Juli 2010 13:59 > An: statalist@hsphsun2.harvard.edu > Betreff: st: RE: AW: AW: Creating a Group Pair ID (where the generating > variables order shouldn't matter) > > Which could in turn be made simpler: > > gen first = cond(c1id < c2id, c1id, c2id) > gen second = cond(c2id < c1id, c1id, c2id) > egen newid = group(first second) > drop first second > sort newid > > could become > > gen newid = cond(c1id < c2id, c1id, c2id) + " " + cond(c2id < c1id, > c1id, c2id) > sort newid > > The cost is greater storage, which may or may not bite: that is, -newid- > is a string variable rather than an integer variable with labels. But if > you have enough space to create -first- and -second- as string > variables, even fleetingly, you presumably have enough space for a > string -newid-. > > Nick > n.j.cox@durham.ac.uk > > Martin Weiss > > Essentially, the technique advocated in NJC`s tip boils down to a simple > trick: > > ************* > clear* > inp str20 c1id str20 c2id > "US" "Canada" > "US" "Mexico" > "Canada" "US" > "US" "France" > "France" "England" > "France" "US" > end > > gen first = cond(c1id < c2id, c1id, c2id) > gen second = cond(c2id < c1id, c1id, c2id) > > egen newid = group(first second) > > drop first second > sort newid > > l, sepby(newid) noo > ************* > > Martin Weiss > > Try NJC`s http://www.stata-journal.com/article.html?article=dm0043 > > J Taylor > > I am trying to create an ID corresponding to numbers from two lists. > For example, if the two lists were of countries, one would have > > clear > input str20 c1id str20 c2id > "US" "Canada" > "US" "Mexico" > "Canada" "US" > "US" "France" > "France" "England" > "France" "US" > end > egen newid = group(c1id c2id) > > I would like newid to create an ID pair for each country pair. My > first instinct was to use the egen group command. However, the > problem is that egen group takes into account which id comes first. > For example, (c1id,c2id)=(United States,Canada) and (c1id,c2id)=( > Canada ,United States) have different IDs. I would like them to be > able to have the same ID. That is, I would like to create newid as a > group pair ID, reflecting which two countries are in the pair, and > where the order doesn't matter. > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/