Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: st: How to create a random number identifier number


From   Anna Reimondos <areimondos@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: AW: st: How to create a random number identifier number
Date   Fri, 13 Nov 2009 09:38:39 +1100

Sorry I made a mistake pasting in the code, it should read:

*-----------------------------------------------------------------------------------------------
*Save dataset

capture drop sortvar                                   //As before-
random number for random sorting
gen sortvar=1 + int(12759*uniform())
replace sortvar=sortvar+10000 if sort<10000


bysort hhid: gen numbers=_N                //How many people live in
the household
keep  hhid numbers sortvar

bysort hhid: gen first=_n if _n==1      //Identify the 1 observation
in each household
keep if first==1                                    //keep only 1
observation (first) per household

sort sortvar                                  //randomly sort the data
gen newhhid =_n                           //new household Id
replace newhhid=newhhid+100000 if newhhid<=10000

expand numbers    //Expand so each hh has as many rows as people in household

*Merge back this dataset using hhid, into the original dataset.
*-----------------------------------------------------------------------------------------------

On Fri, Nov 13, 2009 at 9:26 AM, Anna Reimondos <areimondos@gmail.com> wrote:
> I sucessfully implemented the solution proposed, and checked that
> these were in fact unique identifiers. However I then ran into another
> problem, when trying to do a similar thing for households!
>
> Each of the 11,000 people live in households (around 5,000 households
> in total) and there is a unique 5 digit household identifier which can
> be used to see which people live in the same household. In other
> words, several persons (identified by personid) live in the same
> household (hhid). In the same way as I did for the "personid"  I would
> also like to create a new household identifier, that has five digits
> and is unique.
>
> Example:
>
> person  hhid       "newhhid"
> 1          25643    13584
> 2          25643    13584
> 3          68534     34257
>
> I tried modifying the code for the person id, and applying it to the
> household id but this does not work because I can't randomly sort them
> using the 'sortvar' variable, because it then loses the natural
> ordering of the same household being on consecutive lines. My current
> solution works I think but it means I keep only one line per
> household, save off a new dataset, randomly sort it , create the new
> identifier and then merge it back in. ...Would there be a way to do it
> , while still "staying" in the original dataset?
>
> *-----------------------------------------------------------------------------------------------
> *Save dataset
>
> capture drop sortvar                                   //As before-
> random number for random sorting
> gen sortvar=1 + int(12759*uniform())
> replace sortvar=sortvar+10000 if sort<10000
>
>
> bysort hhid: gen numbers=_N                //How many people live in
> the household
> keep  hhid numbers sortvar
>
> bysort ehhrhid: gen first=_n if _n==1      //Identify the 1
> observation in each household
> keep if first==1                                    //keep only 1
> observation (first) per household
>
> sort sortvar                                          //randomly sort the data
> gen newhhid =_n                                  //new household Id
> replace newhhid=newhhid+100000 if newhhid<=10000
>
> expand numbers                                    //Expand so each
> household has as many rows as people in household
> sort ehhrhid
>
> *Merge back this dataset using hhid, into the original dataset.
> *-----------------------------------------------------------------------------------------------
>
> My original problem has been solved, and my current solution kind of
> works but I would be interested to hear if any one has a more elegant
> way of doing this...
> Thanks very much,
> Anna
>
>
> On Fri, Nov 13, 2009 at 6:10 AM, Michael McCulloch <mm@pinest.org> wrote:
>> Thanks Martin. I imagine there's also a simpler (i.e. more elegant)
>> way to also create the 5-digit new id than this?:
>>        replace newpersonid=newpersonid+50000 if newpersonid<11000
>>
>>
>> On Nov 12, 2009, at 12:11 AM, Martin Weiss wrote:
>>
>>>
>>> <>
>>>
>>>
>>> The -destring- line could easily be omitted, without loss of
>>> functionality...
>>>
>>>
>>>
>>> HTH
>>> Martin
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: owner-statalist@hsphsun2.harvard.edu
>>> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Michael
>>> McCulloch
>>> Gesendet: Donnerstag, 12. November 2009 04:16
>>> An: statalist@hsphsun2.harvard.edu
>>> Betreff: Re: st: How to create a random number identifier number
>>>
>>> Anna,
>>> This simulated example is a better approach, that is faithful to your
>>> need for the newpersonid to have 5 digits.
>>> Michael
>>>
>>> ********* begin example
>>> clear
>>> set obs 11000
>>> gen personid=_n
>>> replace personid=personid+10000 if personid<10000
>>> gen sortvar=1 + int(11000*uniform())
>>>
>>> replace sortvar=sortvar+10000 if sort<10000
>>> sort sortvar
>>>
>>> gen newpersonid str5=_n
>>> destring newpersonid, replace
>>> replace newpersonid=newpersonid+50000 if newpersonid<11000
>>>
>>> list personid newpersonid in 10050/11000
>>> codebook
>>> ********* end example
>>>
>>>
>>>
>>>
>>> Dear Anna, if you sort on some variable other than personid, or
>>> perform a random sort, you could then:
>>>       gen new_personid = _n
>>> This creates a variable which has a value equal to the sequence # of
>>> that record, which is why you have to create some sort order other
>>> than personid.
>>> Michael
>>>
>>>
>>>
>>> On Nov 11, 2009, at 6:37 PM, Anna Reimondos wrote:
>>>
>>>> Hello,
>>>> I am experiencing problems creating a unique set of number for my
>>>> dataset.
>>>>
>>>> I have a dataset with around 11,000 subjects or persons, and each one
>>>> of these subjects has a unique identifier that is 5 digits long
>>>> (personid).
>>>> I need to create a concordance file which list the original 5 digit
>>>> "personid" and matches this to another new randomly created
>>>> identifier
>>>> for each person. This new identifier (new_personid) also has to be 5
>>>> digits long.
>>>>
>>>> Example:
>>>> personid   new_personid
>>>> 10526        35624
>>>> 18594        21893
>>>> 54632        12489
>>>>
>>>> I have tried playing around with the gen  x = uniform() function but
>>>> to no avail. I am unable to create exactly 11,000 unique numbers with
>>>> 5 digits.
>>>> I also tried just using the egen x=se() command, but then the ids are
>>>> sequential and not random and I am afraid then perhaps someone could
>>>> figure out how to match the personid and the newperson id....
>>>>
>>>>
>>>> Any help would be much appreciated,
>>>>
>>>> Thanks
>>>> Anna
>>>>
>>>> (Using STATA 10.1, Windows Vista)
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>>
>>> Michael McCulloch
>>> Pine Street Foundation
>>> 124 Pine Street
>>> San Anselmo, CA 94960-2674
>>> tel:  415-407-1357
>>> fax:  206-338-2391
>>> mm@pinestreetfoundation.org
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>>
>> Michael McCulloch
>> Pine Street Foundation
>> 124 Pine Street
>> San Anselmo, CA 94960-2674
>> tel:    415-407-1357
>> fax:    206-338-2391
>> mm@pinestreetfoundation.org
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index