Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: RE: Encryption of data


From   "Hendri Adriaens" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: RE: Encryption of data
Date   Wed, 13 Jun 2007 18:23:18 +0200

Hi William,

Thanks, that should work, although, as Nick Cox mentioned, there is a tiny
probability that you generate the same number twice. So, one might need a
check afterwards on duplicates and redo the process with a different seed if
there are.

But thanks for your help, best regards,
-Hendri. 

> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of 
> William Gould, StataCorp LP
> Sent: woensdag 13 juni 2007 17:54
> To: [email protected]
> Subject: Re: st: RE: RE: Encryption of data
> 
> Hendri Adriaens <[email protected]> has a dataset and writes, 
> 
> > I want to encrypt only a single variable, to anonimize data.
> 
> Here is what I recommend.
> 
> Let's call the data actual.dta and assume it has variable 
> uid, which is 
> the official user identification number that we want to encrypt.
> uid can be a string or numeric, I don't care.  uid might contain
> 
>         136980408          recorded as a double or long, or 
>         "136-98-408"       recorded as a string, or even 
>         "James Smith"      recorded as a string.
> 
> In what follows, we will allow the repeated repeated values 
> of uid in the
> dataset.  What we are going to do is come up with new id 
> numbers, use those,
> and lock up the mapping of uid from newid.
> 
> Here's step 1:
> 
>         . use actual, clear 
>         . keep uid
>         . sort uid
>         . by uid: keep if _n==1
> 
>         . set seed _______            <- fill this in with a 
> random number
>         . gen double random = uniform()
>         . sort random 
>         . gen long newid = _n
> 
>         . sort uid
>         . save mapping, replace
> 
> New dataset mapping.dta contains two variables:  uid and the 
> corresponding 
> newid.  Next, we fix actual.dta for public consumption:
> 
>         . use actual 
>         . sort uid 
>         . merge uid using mapping
>         . assert _merge==3
>         . drop _merge uid
>         . save actual, replace
> 
> Finally, we put mapping.dta in a save place.  I would write 
> multiple copies 
> of actual.dta on multiple CDs and put the CDs in multiple 
> safes.  Dataset 
> mapping contains all the secret information.
> 
> Dataset actual.dta no longer contains uid; it contains newid.
> 
> -- Bill
> [email protected]
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index