Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: AW: RE: AW: AW: Creating a Group Pair ID (where the generating variables order shouldn't matter)

From	J Taylor <[email protected]>
To	[email protected]
Subject	Re: st: RE: AW: RE: AW: AW: Creating a Group Pair ID (where the generating variables order shouldn't matter)
Date	Sat, 24 Jul 2010 02:14:17 -0700

Thank you Nick and Martin, this is just what I was looking for.

JT

On Fri, Jul 23, 2010 at 5:30 AM, Nick Cox <[email protected]> wrote:
> Good question.
>
> Your identifier would lead to an integer variable with labels if -group()- were used with the -label- option. A good identifier should be informative as well as distinct, so I regard using -label- as very good practice. I didn't spot that you weren't following that very good practice. My mistake.
>
> Nick
> [email protected]
>
> Martin Weiss
>
> " Which could in turn be made simpler:"
>
>
> Though the two approaches hardly lead to the same result. My notion of an
> "ID", as originally requested, would not be a string such as "England
> France", but a numeric variable running from 1 to N, with N the number of
> distinct groups.
>
> *************
> clear*
> inp str20 c1id str20 c2id
> "US" "Canada"
> "US" "Mexico"
> "Canada" "US"
> "US" "France"
> "France" "England"
> "France" "US"
> end
>
> gen newid = cond(c1id < c2id, c1id, c2id)  /*
> */ + " " + cond(c2id < c1id, c1id, c2id)
>
> sort newid
>
> l, sepby(newid) noo
> *************
>
>
> What makes you think that my approach returns a " ...integer variable with
> labels."? All I can find is a -varlabel- attached to my newid.
>
>
>
>
> . d newid
>
>              storage  display     value
> variable name   type   format      label      variable label
> ----------------------------------------------------------------------------
> ---------------
> newid           float  %9.0g                  group(first second)
>
>
>
>
> HTH
> Martin
>
>
> -----Ursprüngliche Nachricht-----
> Von: [email protected]
> [mailto:[email protected]] Im Auftrag von Nick Cox
> Gesendet: Freitag, 23. Juli 2010 13:59
> An: [email protected]
> Betreff: st: RE: AW: AW: Creating a Group Pair ID (where the generating
> variables order shouldn't matter)
>
> Which could in turn be made simpler:
>
> gen first = cond(c1id < c2id, c1id, c2id)
> gen second = cond(c2id < c1id, c1id, c2id)
> egen newid = group(first second)
> drop first second
> sort newid
>
> could become
>
> gen newid = cond(c1id < c2id, c1id, c2id) + " " + cond(c2id < c1id,
> c1id, c2id)
> sort newid
>
> The cost is greater storage, which may or may not bite: that is, -newid-
> is a string variable rather than an integer variable with labels. But if
> you have enough space to create -first- and -second- as string
> variables, even fleetingly, you presumably have enough space for a
> string -newid-.
>
> Nick
> [email protected]
>
> Martin Weiss
>
> Essentially, the technique advocated in NJC`s tip boils down to a simple
> trick:
>
> *************
> clear*
> inp str20 c1id str20 c2id
> "US" "Canada"
> "US" "Mexico"
> "Canada" "US"
> "US" "France"
> "France" "England"
> "France" "US"
> end
>
> gen first = cond(c1id < c2id, c1id, c2id)
> gen second = cond(c2id < c1id, c1id, c2id)
>
> egen newid = group(first second)
>
> drop first second
> sort newid
>
> l, sepby(newid) noo
> *************
>
> Martin Weiss
>
> Try NJC`s http://www.stata-journal.com/article.html?article=dm0043
>
> J Taylor
>
> I am trying to create an ID corresponding to numbers from two lists.
> For example, if the two lists were of countries, one would have
>
> clear
> input str20 c1id str20 c2id
> "US" "Canada"
> "US" "Mexico"
> "Canada" "US"
> "US" "France"
> "France" "England"
> "France" "US"
> end
> egen newid = group(c1id c2id)
>
> I would like newid to create an ID pair for each country pair.  My
> first instinct was to use the egen group command.  However, the
> problem is that egen group takes into account which id comes first.
> For example, (c1id,c2id)=(United States,Canada) and (c1id,c2id)=(
> Canada ,United States) have different IDs.  I would like them to be
> able to have the same ID.  That is, I would like to create newid as a
> group pair ID, reflecting which two countries are in the pair, and
> where the order doesn't matter.
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Creating a Group Pair ID (where the generating variables order shouldn't matter)
  - From: J Taylor <[email protected]>
- st: RE: AW: AW: Creating a Group Pair ID (where the generating variables order shouldn't matter)
  - From: "Nick Cox" <[email protected]>
- st: RE: AW: RE: AW: AW: Creating a Group Pair ID (where the generating variables order shouldn't matter)
  - From: "Nick Cox" <[email protected]>

Prev by Date: st: RE: marginal effects for IV ordered probit
Next by Date: st: AW: RE: Seasonal Dummies and Autocorrelation
Previous by thread: st: RE: AW: RE: AW: AW: Creating a Group Pair ID (where the generating variables order shouldn't matter)
Next by thread: st: Seasonal Dummies and Autocorrelation
Index(es):
- Date
- Thread