Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Adding randomness to a variable

From   Richard Williams <>
Subject   Re: st: Adding randomness to a variable
Date   Mon, 21 Oct 2013 11:41:20 -0500

At 10:04 AM 10/21/2013, Owen Gallupe wrote:

Given the random number generator capabilities of Stata, I suspect
there is an easy solution to this which I just haven't managed to
track down. Having said that, is there any function that allows you to
take an existing variable and add a small degree of randomness to it?
I'm thinking along the lines of a jitter option when generating a
variable. I know that this exact command doesn't actually exist, but a
command of the following form is what I'm looking for:

gen varx = jitter(var)

My idea is that it would take this:

And turn it into something like this:

I'm aware that the following two options would produce something
similar, but my idea is to manually create a variable that has the
exact properties I want for teaching purposes but then add a little
"error" to it.

gen varx = .5*var1 + .8660254*var2

matrix c = (1.00, 0.30, -0.25, -0.10, 0.10, 0.20 \ ///
0.30, 1.00, -0.15, -0.10, 0.12, 0.35 \ ///
-0.25, -0.15, 1.00, 0.13, -0.08, -0.16 \ ///
-0.10, -0.10, 0.13, 1.00, 0.06, -0.14 \ ///
0.10, 0.12, -0.08, 0.06, 1.00, 0.001 \ ///
0.20, 0.35, -0.16, -0.14, 0.001, 1.00)
corr2data var1 var2 var3 var4 var5 var6, n(2000) corr(c)

I've used the corr2data approach to create vars like e1 and e2 that were uncorrelated with anything else, and then added them to the other vars I had created. See (especially page 2)

For existing data you can also do stuff like

gen x2 = x + rnormal()

That will add random noise to x; but corr2data is better if you want EXACT properties, e.g. by chance alone the randomnness you add above could be/ should be slightly correlated with the original x.

Instead of corr2data, consider using drawnorm if you want to be sampling from a population with known properties, rather than creating a population with the exact properties.
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index