[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Anna Reimondos <areimondos@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: AW: st: How to create a random number identifier number |

Date |
Fri, 13 Nov 2009 09:26:39 +1100 |

I sucessfully implemented the solution proposed, and checked that these were in fact unique identifiers. However I then ran into another problem, when trying to do a similar thing for households! Each of the 11,000 people live in households (around 5,000 households in total) and there is a unique 5 digit household identifier which can be used to see which people live in the same household. In other words, several persons (identified by personid) live in the same household (hhid). In the same way as I did for the "personid" I would also like to create a new household identifier, that has five digits and is unique. Example: person hhid "newhhid" 1 25643 13584 2 25643 13584 3 68534 34257 I tried modifying the code for the person id, and applying it to the household id but this does not work because I can't randomly sort them using the 'sortvar' variable, because it then loses the natural ordering of the same household being on consecutive lines. My current solution works I think but it means I keep only one line per household, save off a new dataset, randomly sort it , create the new identifier and then merge it back in. ...Would there be a way to do it , while still "staying" in the original dataset? *----------------------------------------------------------------------------------------------- *Save dataset capture drop sortvar //As before- random number for random sorting gen sortvar=1 + int(12759*uniform()) replace sortvar=sortvar+10000 if sort<10000 bysort hhid: gen numbers=_N //How many people live in the household keep hhid numbers sortvar bysort ehhrhid: gen first=_n if _n==1 //Identify the 1 observation in each household keep if first==1 //keep only 1 observation (first) per household sort sortvar //randomly sort the data gen newhhid =_n //new household Id replace newhhid=newhhid+100000 if newhhid<=10000 expand numbers //Expand so each household has as many rows as people in household sort ehhrhid *Merge back this dataset using hhid, into the original dataset. *----------------------------------------------------------------------------------------------- My original problem has been solved, and my current solution kind of works but I would be interested to hear if any one has a more elegant way of doing this... Thanks very much, Anna On Fri, Nov 13, 2009 at 6:10 AM, Michael McCulloch <mm@pinest.org> wrote: > Thanks Martin. I imagine there's also a simpler (i.e. more elegant) > way to also create the 5-digit new id than this?: > replace newpersonid=newpersonid+50000 if newpersonid<11000 > > > On Nov 12, 2009, at 12:11 AM, Martin Weiss wrote: > >> >> <> >> >> >> The -destring- line could easily be omitted, without loss of >> functionality... >> >> >> >> HTH >> Martin >> >> -----Ursprüngliche Nachricht----- >> Von: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Michael >> McCulloch >> Gesendet: Donnerstag, 12. November 2009 04:16 >> An: statalist@hsphsun2.harvard.edu >> Betreff: Re: st: How to create a random number identifier number >> >> Anna, >> This simulated example is a better approach, that is faithful to your >> need for the newpersonid to have 5 digits. >> Michael >> >> ********* begin example >> clear >> set obs 11000 >> gen personid=_n >> replace personid=personid+10000 if personid<10000 >> gen sortvar=1 + int(11000*uniform()) >> >> replace sortvar=sortvar+10000 if sort<10000 >> sort sortvar >> >> gen newpersonid str5=_n >> destring newpersonid, replace >> replace newpersonid=newpersonid+50000 if newpersonid<11000 >> >> list personid newpersonid in 10050/11000 >> codebook >> ********* end example >> >> >> >> >> Dear Anna, if you sort on some variable other than personid, or >> perform a random sort, you could then: >> gen new_personid = _n >> This creates a variable which has a value equal to the sequence # of >> that record, which is why you have to create some sort order other >> than personid. >> Michael >> >> >> >> On Nov 11, 2009, at 6:37 PM, Anna Reimondos wrote: >> >>> Hello, >>> I am experiencing problems creating a unique set of number for my >>> dataset. >>> >>> I have a dataset with around 11,000 subjects or persons, and each one >>> of these subjects has a unique identifier that is 5 digits long >>> (personid). >>> I need to create a concordance file which list the original 5 digit >>> "personid" and matches this to another new randomly created >>> identifier >>> for each person. This new identifier (new_personid) also has to be 5 >>> digits long. >>> >>> Example: >>> personid new_personid >>> 10526 35624 >>> 18594 21893 >>> 54632 12489 >>> >>> I have tried playing around with the gen x = uniform() function but >>> to no avail. I am unable to create exactly 11,000 unique numbers with >>> 5 digits. >>> I also tried just using the egen x=se() command, but then the ids are >>> sequential and not random and I am afraid then perhaps someone could >>> figure out how to match the personid and the newperson id.... >>> >>> >>> Any help would be much appreciated, >>> >>> Thanks >>> Anna >>> >>> (Using STATA 10.1, Windows Vista) >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >> >> >> >> Michael McCulloch >> Pine Street Foundation >> 124 Pine Street >> San Anselmo, CA 94960-2674 >> tel: 415-407-1357 >> fax: 206-338-2391 >> mm@pinestreetfoundation.org >> >> >> >> >> >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > > > Michael McCulloch > Pine Street Foundation > 124 Pine Street > San Anselmo, CA 94960-2674 > tel: 415-407-1357 > fax: 206-338-2391 > mm@pinestreetfoundation.org > > > > > > > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: AW: st: How to create a random number identifier number***From:*Nick Winter <nwinter@virginia.edu>

**Re: AW: st: How to create a random number identifier number***From:*Anna Reimondos <areimondos@gmail.com>

**References**:**Re: AW: st: How to create a random number identifier number***From:*Michael McCulloch <mm@pinest.org>

- Prev by Date:
**st: RE: Re: manage list of variables** - Next by Date:
**Re: AW: st: How to create a random number identifier number** - Previous by thread:
**Re: AW: st: How to create a random number identifier number** - Next by thread:
**Re: AW: st: How to create a random number identifier number** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |