Note: The following material is based on a question and answer that
appeared on Statalist. This FAQ is
for users of Stata 6, an older version of Stata. It is not relevant for more
recent versions.

Title | Stata 6: Simulating multivariate normal observations | |

Author | William Gould, StataCorp | |

Date | May 1995 |

Pretend one had random variates C = (c1, c2, ..., ck) that are all orthogonal. One rotates this space via

a11 a12 ... a1k Y = AC, A = a21 a22 ... a2k ... ak1 ak2 ... akk

where **A** is an arbitrary but full-rank matrix. Thus we could

. gen c1 = invnorm(uniform()) . gen c2 = invnorm(uniform()) etc.

and then

gen y1 = a11*c1 + a12*c2 + ... + a1k*ck gen y2 = a21*c1 + a22*c2 + ... + a2k*ck

if only we knew the values of **A**. We want to find values of **A**
such that

cov(y1,y2) = as desired cov(y1,y3) = as desired etc.

Let **P** be the desired covariance matrix. The value of **A** we
seek is the square root (Cholesky decomposition) of **P** because we want

P = A'Var(C)A = A'A

**P** can just as well be the correlation matrix—we just have to put
1s on the diagonal.

Thus I could generate 2 variables with correlation .5:

. clear . matrix P = (1,.5 \.5, 1) . mat A = cholesky(P) . mat list AA[2,2] c1 c2 r1 1 0 r2 .5 .8660254. set obs 4000obs was 0, now 4000. gen c1= invnorm(uniform()) . gen c2= invnorm(uniform()) . gen y1 = c1 . gen y2 = .5*c1 + .8660254*c2 . corr y1 y2(obs=4000) | y1 y2 --------+------------------ y1| 1.0000 y2| 0.5079 1.0000

Now the names **c1**, **c2,**, ..., **ck** for the orthogonal
random variates were not chosen at random. Since that is the way Stata, by
default, labels the columns of matrices, we could have generated **y1**
and **y2** using the matrix score function:

. matrix a1 = A[1,.] . matrix score y1 = a1 . matrix a2 = A[2,.] . matrix score y2 = a2

This is handy because (1) we do not have to type the coefficients, and (2)
how much we have to type on a line is not a function of the number of
correlated variables we are generating. In fact, given a matrix **P**,
we could make a crude program to generate **y1**, **y2**, ...:

program define mkcorr version 4.0 local k = rowsof(P) matrix A = cholesky(P) local i 1 quietly { while `i'<=`k' { gen c`i' = invnorm(uniform()) local i=`i'+1 } local i 1 while `i'<=`k' { matrix row = A[`i',.] matrix score y`i' = row local i=`i'+1 } local i 1 while `i' <= `k' { drop c`i' local i=`i'+1 } } end

This program is crude. The name of the covariance (correlation) matrix is
prerecorded—it’s **P**; the names of the output variables are
unsettable—they are **y1**, **y2**, ...; and the names of the
working variables are preset -- they are **c1**, **c2**, ....

It does, however, work.

. clear . matrix P = (1,.25,.5 \ .25, 1, .75 \ .5, .75, 1) . set obs 4000obs was 0, now 4000. mkcorr . summVariable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- y1 | 4000 .0095433 .9973219 -3.053706 4.167702 y2 | 4000 -.0059072 .9962081 -3.341381 3.419445 y3 | 4000 -.0086885 1.009847 -3.448567 4.026851. corr(obs=4000) | y1 y2 y3 --------+--------------------------- y1| 1.0000 y2| 0.2664 1.0000 y3| 0.5206 0.7441 1.0000

Obviously, someone could make **mkcorr** fancier.