Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: AW: utility to create fake dataset?


From   Daljit Dhadwal <ddhadwal@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: AW: utility to create fake dataset?
Date   Sun, 8 Nov 2009 10:06:23 -0800

It sounds like youre trying to create anonymized data sets.  There
are lots of different names for the techniques for doing this: data
masking, data anonymization, data obfuscation, data de-identification,
data depersonalization, data scrubbing, and data scrambling.

Here’s the Wikipedia article on data masking:
http://en.wikipedia.org/wiki/Data_masking

Here’s a good powerpoint presentation that discusses some of the
techniques used in data masking:
http://www.cs.uky.edu/events/dmSec08.ppt

Thanks,

Daljit


On Sun, Nov 8, 2009 at 9:32 AM, Martin Weiss <martin.weiss1@gmx.de> wrote:
>
> <>
>
>
>
> *************
> h clonevar
> *************
>
> comes to mind...
>
>
> HTH
> Martin
>
>
> -----Ursprüngliche Nachricht-----
> Von: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Jeph Herrin
> Gesendet: Sonntag, 8. November 2009 18:20
> An: statalist@hsphsun2.harvard.edu
> Betreff: st: utility to create fake dataset?
>
>
> I sometimes need to create a "fake" dataset that "looks?
> like an existing dataset. For example, a dataset that
> must, for health privacy reasons, remain on a remote server,
> and I would like to develop code locally to run on it.
> Or, I need to make mock tables to share with colleagues
> who need to remain blinded for now to actual study data.
>
> Usually, I just do something that seems "good enough", like
> sample 5%, expand 20, replace values with random values, etc.
> Or, in an extreme case, set obs to be twice the existing obs
> and keep the ones with missing data. But the first is not
> very satisfying when I need to reassure higher powers that
> I have a "dummy" dataset, and the second is not very helpful
> for writing final useable code.
>
> So, I'm thinking I'll write a utility to create a 'dummy'
> dataset from an existing dataset, but wondered if there was
> something out there already. Perhaps there is even a well
> established name for this process? My searches for "dummy"
> and "fake" dataset have not been fruitful.
>
> thanks,
> Jeph
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index