Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: st: How to create a random number identifier number


From   Anna Reimondos <areimondos@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: AW: st: How to create a random number identifier number
Date   Fri, 13 Nov 2009 09:26:39 +1100

I sucessfully implemented the solution proposed, and checked that
these were in fact unique identifiers. However I then ran into another
problem, when trying to do a similar thing for households!

Each of the 11,000 people live in households (around 5,000 households
in total) and there is a unique 5 digit household identifier which can
be used to see which people live in the same household. In other
words, several persons (identified by personid) live in the same
household (hhid). In the same way as I did for the "personid"  I would
also like to create a new household identifier, that has five digits
and is unique.

Example:

person  hhid       "newhhid"
1          25643    13584
2          25643    13584
3          68534     34257

I tried modifying the code for the person id, and applying it to the
household id but this does not work because I can't randomly sort them
using the 'sortvar' variable, because it then loses the natural
ordering of the same household being on consecutive lines. My current
solution works I think but it means I keep only one line per
household, save off a new dataset, randomly sort it , create the new
identifier and then merge it back in. ...Would there be a way to do it
, while still "staying" in the original dataset?

*-----------------------------------------------------------------------------------------------
*Save dataset

capture drop sortvar                                   //As before-
random number for random sorting
gen sortvar=1 + int(12759*uniform())
replace sortvar=sortvar+10000 if sort<10000


bysort hhid: gen numbers=_N                //How many people live in
the household
keep  hhid numbers sortvar

bysort ehhrhid: gen first=_n if _n==1      //Identify the 1
observation in each household
keep if first==1                                    //keep only 1
observation (first) per household

sort sortvar                                          //randomly sort the data
gen newhhid =_n                                  //new household Id
replace newhhid=newhhid+100000 if newhhid<=10000

expand numbers                                    //Expand so each
household has as many rows as people in household
sort ehhrhid

*Merge back this dataset using hhid, into the original dataset.
*-----------------------------------------------------------------------------------------------

My original problem has been solved, and my current solution kind of
works but I would be interested to hear if any one has a more elegant
way of doing this...
Thanks very much,
Anna


On Fri, Nov 13, 2009 at 6:10 AM, Michael McCulloch <mm@pinest.org> wrote:
> Thanks Martin. I imagine there's also a simpler (i.e. more elegant)
> way to also create the 5-digit new id than this?:
>        replace newpersonid=newpersonid+50000 if newpersonid<11000
>
>
> On Nov 12, 2009, at 12:11 AM, Martin Weiss wrote:
>
>>
>> <>
>>
>>
>> The -destring- line could easily be omitted, without loss of
>> functionality...
>>
>>
>>
>> HTH
>> Martin
>>
>> -----Ursprüngliche Nachricht-----
>> Von: owner-statalist@hsphsun2.harvard.edu
>> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Michael
>> McCulloch
>> Gesendet: Donnerstag, 12. November 2009 04:16
>> An: statalist@hsphsun2.harvard.edu
>> Betreff: Re: st: How to create a random number identifier number
>>
>> Anna,
>> This simulated example is a better approach, that is faithful to your
>> need for the newpersonid to have 5 digits.
>> Michael
>>
>> ********* begin example
>> clear
>> set obs 11000
>> gen personid=_n
>> replace personid=personid+10000 if personid<10000
>> gen sortvar=1 + int(11000*uniform())
>>
>> replace sortvar=sortvar+10000 if sort<10000
>> sort sortvar
>>
>> gen newpersonid str5=_n
>> destring newpersonid, replace
>> replace newpersonid=newpersonid+50000 if newpersonid<11000
>>
>> list personid newpersonid in 10050/11000
>> codebook
>> ********* end example
>>
>>
>>
>>
>> Dear Anna, if you sort on some variable other than personid, or
>> perform a random sort, you could then:
>>       gen new_personid = _n
>> This creates a variable which has a value equal to the sequence # of
>> that record, which is why you have to create some sort order other
>> than personid.
>> Michael
>>
>>
>>
>> On Nov 11, 2009, at 6:37 PM, Anna Reimondos wrote:
>>
>>> Hello,
>>> I am experiencing problems creating a unique set of number for my
>>> dataset.
>>>
>>> I have a dataset with around 11,000 subjects or persons, and each one
>>> of these subjects has a unique identifier that is 5 digits long
>>> (personid).
>>> I need to create a concordance file which list the original 5 digit
>>> "personid" and matches this to another new randomly created
>>> identifier
>>> for each person. This new identifier (new_personid) also has to be 5
>>> digits long.
>>>
>>> Example:
>>> personid   new_personid
>>> 10526        35624
>>> 18594        21893
>>> 54632        12489
>>>
>>> I have tried playing around with the gen  x = uniform() function but
>>> to no avail. I am unable to create exactly 11,000 unique numbers with
>>> 5 digits.
>>> I also tried just using the egen x=se() command, but then the ids are
>>> sequential and not random and I am afraid then perhaps someone could
>>> figure out how to match the personid and the newperson id....
>>>
>>>
>>> Any help would be much appreciated,
>>>
>>> Thanks
>>> Anna
>>>
>>> (Using STATA 10.1, Windows Vista)
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>>
>> Michael McCulloch
>> Pine Street Foundation
>> 124 Pine Street
>> San Anselmo, CA 94960-2674
>> tel:  415-407-1357
>> fax:  206-338-2391
>> mm@pinestreetfoundation.org
>>
>>
>>
>>
>>
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
>
> Michael McCulloch
> Pine Street Foundation
> 124 Pine Street
> San Anselmo, CA 94960-2674
> tel:    415-407-1357
> fax:    206-338-2391
> mm@pinestreetfoundation.org
>
>
>
>
>
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index