Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: combine 3 numeric vars. to create 1 unique id. var

From   David Kantor <[email protected]>
To   [email protected]
Subject   Re: st: combine 3 numeric vars. to create 1 unique id. var
Date   Tue, 13 Aug 2002 10:44:32 -0400

At 10:14 AM 8/13/2002 +0600, Nyima wrote:

I need to create a unique id variable by combining three different numeric variables viz. "district", "house" and "person". "District" has been entered as a four digit number code (1000 to 1500). Each "District" has "house"s numbered 1 to 150 (no zero in front) and each "house" has "person" with numbers ranging from 1 to 12 (again no zero in front). I need the uniqe id variable to merge this dataset with another data file having other information about these particular people.
First, does the other data set have the same three identifiers? If so, then there is no need to combine them into one variable. Use the three together as you key:

merge district house person using ...

But if you still want to combine them into one variable, do this:

assert district >=0 & district <=1500
assert house >=0 & house <=150
assert person >=0 & person <=12

gen long newvar = (district * 100000) + (house * 100) + person

Note that the upper bound on person could be as high as 99, and the upper bound on house could be as high as 999. These assure that the mapping is one-to-one. The upper bound on district could also be higher; it is there to assure that you don't get numeric overflow.

(The coefficients are powers of 10; that isn't absolutely necessary, but it is convenient for human readers of the resulting numbers. Smaller coefficients can be used for a more compact mapping. But in any case, the coefficients must be tailored to the ranges of values in person and house.)

Given this, you can also go backwards -- taking your newvar and deriving the district, house, and person.
gen byte person = mod(newvar, 100)
gen int house = mod(int(newvar/100), 1000)
gen int district = int(newvar/100000)

I hope this helps.

David Kantor
Institute for Policy Studies
Johns Hopkins University
[email protected]

* For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index