Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: AW: AW: Creating a Group Pair ID (where the generating variables order shouldn't matter)

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: RE: AW: AW: Creating a Group Pair ID (where the generating variables order shouldn't matter)
Date	Fri, 23 Jul 2010 12:58:46 +0100

Which could in turn be made simpler: 

gen first = cond(c1id < c2id, c1id, c2id)
gen second = cond(c2id < c1id, c1id, c2id)
egen newid = group(first second)
drop first second
sort newid

could become 

gen newid = cond(c1id < c2id, c1id, c2id) + " " + cond(c2id < c1id,
c1id, c2id)
sort newid

The cost is greater storage, which may or may not bite: that is, -newid-
is a string variable rather than an integer variable with labels. But if
you have enough space to create -first- and -second- as string
variables, even fleetingly, you presumably have enough space for a
string -newid-. 

Nick 
[email protected] 

Martin Weiss

Essentially, the technique advocated in NJC`s tip boils down to a simple
trick:

*************
clear*
inp str20 c1id str20 c2id
"US" "Canada"
"US" "Mexico"
"Canada" "US"
"US" "France"
"France" "England"
"France" "US"
end

gen first = cond(c1id < c2id, c1id, c2id)
gen second = cond(c2id < c1id, c1id, c2id)

egen newid = group(first second)

drop first second
sort newid

l, sepby(newid) noo
*************

Martin Weiss

Try NJC`s http://www.stata-journal.com/article.html?article=dm0043

J Taylor

I am trying to create an ID corresponding to numbers from two lists.
For example, if the two lists were of countries, one would have

clear
input str20 c1id str20 c2id
"US" "Canada"
"US" "Mexico"
"Canada" "US"
"US" "France"
"France" "England"
"France" "US"
end
egen newid = group(c1id c2id)

I would like newid to create an ID pair for each country pair.  My
first instinct was to use the egen group command.  However, the
problem is that egen group takes into account which id comes first.
For example, (c1id,c2id)=(United States,Canada) and (c1id,c2id)=(
Canada ,United States) have different IDs.  I would like them to be
able to have the same ID.  That is, I would like to create newid as a
group pair ID, reflecting which two countries are in the pair, and
where the order doesn't matter.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: AW: RE: AW: AW: Creating a Group Pair ID (where the generating variables order shouldn't matter)
  - From: "Martin Weiss" <[email protected]>

References:
- st: Creating a Group Pair ID (where the generating variables order shouldn't matter)
  - From: J Taylor <[email protected]>
- st: AW: Creating a Group Pair ID (where the generating variables order shouldn't matter)
  - From: "Martin Weiss" <[email protected]>
- st: AW: AW: Creating a Group Pair ID (where the generating variables order shouldn't matter)
  - From: "Martin Weiss" <[email protected]>

Prev by Date: Re: st: Factor Analysis and Multiple Imputation
Next by Date: st: Seasonal Dummies and Autocorrelation
Previous by thread: st: AW: AW: Creating a Group Pair ID (where the generating variables order shouldn't matter)
Next by thread: st: AW: RE: AW: AW: Creating a Group Pair ID (where the generating variables order shouldn't matter)
Index(es):
- Date
- Thread