# st: RE: RE: Identifier from three variables

 From n j cox To statalist@hsphsun2.harvard.edu Subject st: RE: RE: Identifier from three variables Date Mon, 30 Jul 2007 16:22:11 +0100

This isn't another way of doing it. It is a variation
and indeed by myself (#3 in the post to which this
post replies).

Generating a numeric identifier by this method has no
compared with concatenation of strings:

1. You need to check the limits of each identifier,
as Rafael points out, as the result will be legal to Stata.

2. If you forget #1 or make a mistake on #1, the resulting
problems may not be obvious.

3. Here the default variable type is a -float-. Using
that to hold (very large) integers could run into precision
problems. Rafael is careful to recommend using a -double-
if needed but the problem is that everyone has to remember to do
that.

4. Concatenation can make use of separators, as in

egen id = concat(house family individual), p("_")

5. The reverse engineering from composite to
individual identifiers is easier with -split-
(provided you follow #4) than it is usually is
with composite integers.

Rafael's method, which is quite often used, is relatively
simple and can be unproblematic, but it is not a good
general method in my view.

Nick
n.j.cox@durham.ac.uk

Rafael Osorio

Other way of doing it, supposing each identifier is <= 99:

gen id = house*1000+family*100+individual

By doing it this way you can tell to which house and family a specific person belongs to. If your data was not sorted by the identifiers, if you sort by this new identifier the result will be equal to sorting it by all identifiers. If your final id is large: gen double id...

Nick Cox

1. Don't do that. Use -egen, group()- with the -label- option.

FAQ . . . . . . . . . . . . . . . . . . . . . . Creating group identifiers
3/01 How do I create individual identifiers numbered
from 1 upwards?
http://www.stata.com/support/faqs/data/group.html

3. If you really must, look into -egen, concat()-, including
its options.

Nick
n.j.cox@durham.ac.uk

Nádia N. Simőes
>
> How can I generate an identifier from three variables?
> In my data I have a column for the house, one for the family
> and one for the individual
>
> example:
>
> house family individual
> 1 1 1
> 1 1 2
> 1 1 3
> 2 1 1
> 2 1 2
> ...
>
> and I would like to know how can I create an identifier per
> individual such as:
>
> individual
> 010101
> 010102
> 010103
> 020101
> ...

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/