Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Fwd: Constructing Household IDs

From   Nick Cox <>
Subject   Re: st: Fwd: Constructing Household IDs
Date   Wed, 30 Jan 2013 09:12:25 +0000

This kind of question is frequently asked. Long integer or long string
solutions require care to avoid problems of precision and/or storage.

The solutions

egen id = group(SSS TTT DDD HHH), label


egen id = concat(SSS TTT DDD HHH)

are frequently overlooked and apply to numeric and string variables alike.

See also the reference given at


On Wed, Jan 30, 2013 at 8:37 AM, Andrea Smurra <> wrote:

> Thanks to all,
> Nigussie, your method is definitely the most creative as it does not require
> anything more than gen and some algebra, thanks for your support.
> I ended up following Chamara method using the following command
> gen str3 z=string(x, "%003,0f")
> and then
> gen hhid=state+district +...

 Il 30/01/2013 14:20, nigussie Tefera ha scritto:

>> Suppose each of them stand with a three digits identifiers, i.e. state has
>> a maximum of three digits numerical values and so no. Simply, suppose you
>> have 343 for sss, 213 for DDD, 567 for TTT … and at last you have 143 for
>> HHH. So if you want to generate unique household id identifier of the form
>> 343213567143, you can write the following simple command.
>> gen double hhid=10^9*sss+10^6*DDD+10^3*TTT+HHH
>> Note that 10^”x” could vary depending on the number of maximum digits that
>> either of them have....

Chamara Anuranga

>> check all identifier variable and check the maximum length.
>> state may be maximum 2 digits
>> district may be 3 digits etc.
>> add leading zeros to id variables base on maximum number
>> format state %02.0f
>> format district %03.0f
>> here % to represent format, 0 to represent leading zero and .0 is no
>> decimal places and f mean fix format
>> then convert the variable to string.
>> tostring state district,replace usedis
>> then combine each string part using generate command
>> gen hhid=state+district

>> On Wed, Jan 30, 2013 at 12:27 PM, Andrea Smurra

>>> I am working with an household survey which doesn't have household IDs.
>>> Each household is identified by a series of variables (State, district,
>>> township, ..., household number).
>>> Within each state, the numbering of districts always starts from the
>>> integer
>>> 1, the same for towns within each district and so on up to the household
>>> in
>>> each ward.
>>> I tried to build unique HH identifier with the command "group", but I'd
>>> like
>>> to build a HH ID which looks like SSSDDDTTT...HHH
>>> where SSS is the state identifier (with the correct number of zeros
>>> appended
>>> when necessary (i don't know how to do it)), DDD is the District
>>> identifier
>>> and so on.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index