Sorry I made a mistake pasting in the code, it should read: *----------------------------------------------------------------------------------------------- *Save dataset capture drop sortvar //As before- random number for random sorting gen sortvar=1 + int(12759*uniform()) replace sortvar=sortvar+10000 if sort<10000 bysort hhid: gen numbers=_N //How many people live in the household keep hhid numbers sortvar bysort hhid: gen first=_n if _n==1 //Identify the 1 observation in each household keep if first==1 //keep only 1 observation (first) per household sort sortvar //randomly sort the data gen newhhid =_n //new household Id replace newhhid=newhhid+100000 if newhhid<=10000 expand numbers //Expand so each hh has as many rows as people in household *Merge back this dataset using hhid, into the original dataset. *----------------------------------------------------------------------------------------------- On Fri, Nov 13, 2009 at 9:26 AM, Anna Reimondos <areimondos@gmail.com> wrote: > I sucessfully implemented the solution proposed, and checked that > these were in fact unique identifiers. However I then ran into another > problem, when trying to do a similar thing for households! > > Each of the 11,000 people live in households (around 5,000 households > in total) and there is a unique 5 digit household identifier which can > be used to see which people live in the same household. In other > words, several persons (identified by personid) live in the same > household (hhid). In the same way as I did for the "personid" I would > also like to create a new household identifier, that has five digits > and is unique. > > Example: > > person hhid "newhhid" > 1 25643 13584 > 2 25643 13584 > 3 68534 34257 > > I tried modifying the code for the person id, and applying it to the > household id but this does not work because I can't randomly sort them > using the 'sortvar' variable, because it then loses the natural > ordering of the same household being on consecutive lines. My current > solution works I think but it means I keep only one line per > household, save off a new dataset, randomly sort it , create the new > identifier and then merge it back in. ...Would there be a way to do it > , while still "staying" in the original dataset? > > *----------------------------------------------------------------------------------------------- > *Save dataset > > capture drop sortvar //As before- > random number for random sorting > gen sortvar=1 + int(12759*uniform()) > replace sortvar=sortvar+10000 if sort<10000 > > > bysort hhid: gen numbers=_N //How many people live in > the household > keep hhid numbers sortvar > > bysort ehhrhid: gen first=_n if _n==1 //Identify the 1 > observation in each household > keep if first==1 //keep only 1 > observation (first) per household > > sort sortvar //randomly sort the data > gen newhhid =_n //new household Id > replace newhhid=newhhid+100000 if newhhid<=10000 > > expand numbers //Expand so each > household has as many rows as people in household > sort ehhrhid > > *Merge back this dataset using hhid, into the original dataset. > *----------------------------------------------------------------------------------------------- > > My original problem has been solved, and my current solution kind of > works but I would be interested to hear if any one has a more elegant > way of doing this... > Thanks very much, > Anna > > > On Fri, Nov 13, 2009 at 6:10 AM, Michael McCulloch <mm@pinest.org> wrote: >> Thanks Martin. I imagine there's also a simpler (i.e. more elegant) >> way to also create the 5-digit new id than this?: >> replace newpersonid=newpersonid+50000 if newpersonid<11000 >> >> >> On Nov 12, 2009, at 12:11 AM, Martin Weiss wrote: >> >>> >>> <> >>> >>> >>> The -destring- line could easily be omitted, without loss of >>> functionality... >>> >>> >>> >>> HTH >>> Martin >>> >>> -----Ursprüngliche Nachricht----- >>> Von: owner-statalist@hsphsun2.harvard.edu >>> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Michael >>> McCulloch >>> Gesendet: Donnerstag, 12. November 2009 04:16 >>> An: statalist@hsphsun2.harvard.edu >>> Betreff: Re: st: How to create a random number identifier number >>> >>> Anna, >>> This simulated example is a better approach, that is faithful to your >>> need for the newpersonid to have 5 digits. >>> Michael >>> >>> ********* begin example >>> clear >>> set obs 11000 >>> gen personid=_n >>> replace personid=personid+10000 if personid<10000 >>> gen sortvar=1 + int(11000*uniform()) >>> >>> replace sortvar=sortvar+10000 if sort<10000 >>> sort sortvar >>> >>> gen newpersonid str5=_n >>> destring newpersonid, replace >>> replace newpersonid=newpersonid+50000 if newpersonid<11000 >>> >>> list personid newpersonid in 10050/11000 >>> codebook >>> ********* end example >>> >>> >>> >>> >>> Dear Anna, if you sort on some variable other than personid, or >>> perform a random sort, you could then: >>> gen new_personid = _n >>> This creates a variable which has a value equal to the sequence # of >>> that record, which is why you have to create some sort order other >>> than personid. >>> Michael >>> >>> >>> >>> On Nov 11, 2009, at 6:37 PM, Anna Reimondos wrote: >>> >>>> Hello, >>>> I am experiencing problems creating a unique set of number for my >>>> dataset. >>>> >>>> I have a dataset with around 11,000 subjects or persons, and each one >>>> of these subjects has a unique identifier that is 5 digits long >>>> (personid). >>>> I need to create a concordance file which list the original 5 digit >>>> "personid" and matches this to another new randomly created >>>> identifier >>>> for each person. This new identifier (new_personid) also has to be 5 >>>> digits long. >>>> >>>> Example: >>>> personid new_personid >>>> 10526 35624 >>>> 18594 21893 >>>> 54632 12489 >>>> >>>> I have tried playing around with the gen x = uniform() function but >>>> to no avail. I am unable to create exactly 11,000 unique numbers with >>>> 5 digits. >>>> I also tried just using the egen x=se() command, but then the ids are >>>> sequential and not random and I am afraid then perhaps someone could >>>> figure out how to match the personid and the newperson id.... >>>> >>>> >>>> Any help would be much appreciated, >>>> >>>> Thanks >>>> Anna >>>> >>>> (Using STATA 10.1, Windows Vista) >>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/statalist/faq >>>> * http://www.ats.ucla.edu/stat/stata/ >>> >>> >>> >>> Michael McCulloch >>> Pine Street Foundation >>> 124 Pine Street >>> San Anselmo, CA 94960-2674 >>> tel: 415-407-1357 >>> fax: 206-338-2391 >>> mm@pinestreetfoundation.org >>> >>> >>> >>> >>> >>> >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >>> >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >> >> >> >> Michael McCulloch >> Pine Street Foundation >> 124 Pine Street >> San Anselmo, CA 94960-2674 >> tel: 415-407-1357 >> fax: 206-338-2391 >> mm@pinestreetfoundation.org >> >> >> >> >> >> >> >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

