Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: RE: st: RE: RE: Encryption of data


From   "Maarten Buis" <[email protected]>
To   <[email protected]>
Subject   RE: RE: st: RE: RE: Encryption of data
Date   Fri, 15 Jun 2007 11:33:57 +0200

--- Hendri Adriaens wrote:
> I was talking about the situation that:
>
>        uid    newid
> 123-45-6789      100
> 999-99-9999      100

With Bill's solution you will never get into that 
situation, it is logically impossible. So there 
is no need to draw two random numbers or do 
anything more than Bill told you. 

The reason is that Bill created newid as follows:
gen double random = uniform()
sort random 
gen long newid = _n

So, the random draws created by uniform() are 
used to order the data, and than the newid is 
assigned the current observation number, i.e. the 
first observation is given a 1, the second a 2, 
the third a 3, etc. Hence it is impossible to get 
ties. Nick pointed out that there is a very small 
but nonzero probability that the function uniform() 
creates ties, but that is in this case irrelevant 
because in those cases the ties are ordered 
randomly, so both will get a distinct value on 
newid. 

I think that what you want to do is split a file 
into two pieces and hide which observations belong
together, so in one file you have for instance city
of residence, and in the other occupation, and you 
don't want people to match the two (unless you have 
given them permission and the key) The example 
below does that. (Note, that in your case I wouldn't 
use -tempfile-, actually you want to make double 
sure that you save files a b and key in many 
different secure places, I use -tempfile- so I won't 
fill my hard drive with all kinds of example 
datasets.)

*----------- begin example -------------------
sysuse auto, clear
tempfile a b key full

/* create key */
gen double random = uniform()
sort random
gen newid = _n
preserve 
keep make newid
save `key'
list in 1/10
restore

/* create file a */
preserve
keep newid mpg foreign
save `a'
restore

/*create file b*/
drop newid mpg foreign
save `b'
sort make

/*describe files a and b*/
desc

use `a', clear
desc
list newid in 1/10

/*merge files a and b*/
use `key', clear
sort make
save `key', replace

use `b', clear
sort make
merge make using `key'
tab _merge
drop _merge
sort newid
save `full'

use `a', clear
sort newid
merge newid using `full'
tab _merge

desc
*------------ end example --------------------
(For more on how to use examples I sent to the Statalist, see
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )


Hope this helps,
Maarten

-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology 
Vrije Universiteit Amsterdam 
Boelelaan 1081 
1081 HV Amsterdam 
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434 

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index