[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: st: How to create a random number identifier number

From   Nick Winter <>
Subject   Re: AW: st: How to create a random number identifier number
Date   Thu, 12 Nov 2009 17:52:32 -0500

With the data still sorted by HH, couldn't you simply replace the random number with the random number of the first observation (within HH). Immediately after generating the random sortvar:

by hhid: replace sortvar=sortvar[1]

- Nick Winter
Anna Reimondos wrote:
I sucessfully implemented the solution proposed, and checked that
these were in fact unique identifiers. However I then ran into another
problem, when trying to do a similar thing for households!

Each of the 11,000 people live in households (around 5,000 households
in total) and there is a unique 5 digit household identifier which can
be used to see which people live in the same household. In other
words, several persons (identified by personid) live in the same
household (hhid). In the same way as I did for the "personid"  I would
also like to create a new household identifier, that has five digits
and is unique.


person  hhid       "newhhid"
1          25643    13584
2          25643    13584
3          68534     34257

I tried modifying the code for the person id, and applying it to the
household id but this does not work because I can't randomly sort them
using the 'sortvar' variable, because it then loses the natural
ordering of the same household being on consecutive lines. My current
solution works I think but it means I keep only one line per
household, save off a new dataset, randomly sort it , create the new
identifier and then merge it back in. ...Would there be a way to do it
, while still "staying" in the original dataset?

*Save dataset

capture drop sortvar                                   //As before-
random number for random sorting
gen sortvar=1 + int(12759*uniform())
replace sortvar=sortvar+10000 if sort<10000

bysort hhid: gen numbers=_N                //How many people live in
the household
keep  hhid numbers sortvar

bysort ehhrhid: gen first=_n if _n==1      //Identify the 1
observation in each household
keep if first==1                                    //keep only 1
observation (first) per household

sort sortvar                                          //randomly sort the data
gen newhhid =_n                                  //new household Id
replace newhhid=newhhid+100000 if newhhid<=10000

expand numbers                                    //Expand so each
household has as many rows as people in household
sort ehhrhid

*Merge back this dataset using hhid, into the original dataset.

My original problem has been solved, and my current solution kind of
works but I would be interested to hear if any one has a more elegant
way of doing this...
Thanks very much,

On Fri, Nov 13, 2009 at 6:10 AM, Michael McCulloch <> wrote:
Thanks Martin. I imagine there's also a simpler (i.e. more elegant)
way to also create the 5-digit new id than this?:
       replace newpersonid=newpersonid+50000 if newpersonid<11000

On Nov 12, 2009, at 12:11 AM, Martin Weiss wrote:


The -destring- line could easily be omitted, without loss of


-----Ursprüngliche Nachricht-----
[] Im Auftrag von Michael
Gesendet: Donnerstag, 12. November 2009 04:16
Betreff: Re: st: How to create a random number identifier number

This simulated example is a better approach, that is faithful to your
need for the newpersonid to have 5 digits.

********* begin example
set obs 11000
gen personid=_n
replace personid=personid+10000 if personid<10000
gen sortvar=1 + int(11000*uniform())

replace sortvar=sortvar+10000 if sort<10000
sort sortvar

gen newpersonid str5=_n
destring newpersonid, replace
replace newpersonid=newpersonid+50000 if newpersonid<11000

list personid newpersonid in 10050/11000
********* end example

Dear Anna, if you sort on some variable other than personid, or
perform a random sort, you could then:
      gen new_personid = _n
This creates a variable which has a value equal to the sequence # of
that record, which is why you have to create some sort order other
than personid.

On Nov 11, 2009, at 6:37 PM, Anna Reimondos wrote:

I am experiencing problems creating a unique set of number for my

I have a dataset with around 11,000 subjects or persons, and each one
of these subjects has a unique identifier that is 5 digits long
I need to create a concordance file which list the original 5 digit
"personid" and matches this to another new randomly created
for each person. This new identifier (new_personid) also has to be 5
digits long.

personid   new_personid
10526        35624
18594        21893
54632        12489

I have tried playing around with the gen  x = uniform() function but
to no avail. I am unable to create exactly 11,000 unique numbers with
5 digits.
I also tried just using the egen x=se() command, but then the ids are
sequential and not random and I am afraid then perhaps someone could
figure out how to match the personid and the newperson id....

Any help would be much appreciated,


(Using STATA 10.1, Windows Vista)

*   For searches and help try:

Michael McCulloch
Pine Street Foundation
124 Pine Street
San Anselmo, CA 94960-2674
tel:  415-407-1357
fax:  206-338-2391

*   For searches and help try:

*   For searches and help try:

Michael McCulloch
Pine Street Foundation
124 Pine Street
San Anselmo, CA 94960-2674
tel:    415-407-1357
fax:    206-338-2391

*   For searches and help try:

*   For searches and help try:

Nicholas Winter                                 434.924.6994 t
Assistant Professor                             434.924.3359 f
Department of Politics         e
University of Virginia w
PO Box 400787, 100 Cabell Hall
Charlottesville, VA 22904

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index