Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: RE: Encryption of data


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: RE: Encryption of data
Date   Wed, 13 Jun 2007 17:31:31 +0100

I am as fond of -duplicates- as my twin, but it 
is just a convenience command. 

bysort random : assert _N == 1 

is a much more direct way of testing that random
numbers are unique. 

Nick 
[email protected] 

Hendri Adriaens
 
> Hi William,
> 
> Thanks, that should work, although, as Nick Cox mentioned, 
> there is a tiny
> probability that you generate the same number twice. So, one 
> might need a
> check afterwards on duplicates and redo the process with a 
> different seed if
> there are.
 
William Gould, StataCorp LP

> > Hendri Adriaens <[email protected]> has a dataset and writes, 
> > 
> > > I want to encrypt only a single variable, to anonimize data.
> > 
> > Here is what I recommend.
> > 
> > Let's call the data actual.dta and assume it has variable 
> > uid, which is 
> > the official user identification number that we want to encrypt.
> > uid can be a string or numeric, I don't care.  uid might contain
> > 
> >         136980408          recorded as a double or long, or 
> >         "136-98-408"       recorded as a string, or even 
> >         "James Smith"      recorded as a string.
> > 
> > In what follows, we will allow the repeated repeated values 
> > of uid in the
> > dataset.  What we are going to do is come up with new id 
> > numbers, use those,
> > and lock up the mapping of uid from newid.
> > 
> > Here's step 1:
> > 
> >         . use actual, clear 
> >         . keep uid
> >         . sort uid
> >         . by uid: keep if _n==1
> > 
> >         . set seed _______            <- fill this in with a 
> > random number
> >         . gen double random = uniform()
> >         . sort random 
> >         . gen long newid = _n
> > 
> >         . sort uid
> >         . save mapping, replace
> > 
> > New dataset mapping.dta contains two variables:  uid and the 
> > corresponding 
> > newid.  Next, we fix actual.dta for public consumption:
> > 
> >         . use actual 
> >         . sort uid 
> >         . merge uid using mapping
> >         . assert _merge==3
> >         . drop _merge uid
> >         . save actual, replace
> > 
> > Finally, we put mapping.dta in a save place.  I would write 
> > multiple copies 
> > of actual.dta on multiple CDs and put the CDs in multiple 
> > safes.  Dataset 
> > mapping contains all the secret information.
> > 
> > Dataset actual.dta no longer contains uid; it contains newid.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index