# st: RE: Simulating multilevel data in Stata

 From "Feiveson, Alan H. (JSC-SK311)" To Subject st: RE: Simulating multilevel data in Stata Date Tue, 23 Dec 2008 11:15:07 -0600

```Jim -

If I understand your question, you want to "fix" your randomly generated
X1, X2 and u so that their sample covariance matrix exactly equals the
one you want. Here's one way to do this (see below.

Al Feiveson

. matrix V = I(3)
. matrix V[1,2]=.5
. matrix V[2,1]=.5
. matrix V[1,3]=.3
. matrix V[3,1]=.3

. matrix list V

symmetric V[3,3]
c1  c2  c3
r1   1
r2  .5   1
r3  .3   0   1

. set obs 50
obs was 0, now 50

. drawnorm X1 X2 u,cov(V)

. corr X1 X2 u,cov
(obs=50)

|       X1       X2        u
-------------+---------------------------
X1 |  1.01209
X2 |  .467516  .953286
u |  .571851  .073683  1.12376

. matrix accum A = X1 X2 u,dev noc
(obs=50)

. matrix A=(1/49)*A

. matrix list A

symmetric A[3,3]
X1         X2          u
X1  1.0120898
X2  .46751642  .95328608
u   .5718511  .07368264   1.123759

. matrix H=cholesky(V)
. matrix G=cholesky(A)
. matrix GI=inv(G)

. matrix HGI=H*GI

. matrix list HGI

HGI[3,3]
X1          X2           u
r1   .99400935           0           0
r2   .03111956   1.0085584           0
r3  -.34919894   .07784363    1.082162

. des

Contains data
obs:            50
vars:             3
size:           800 (99.9% of memory free)
------------------------------------------------------------------------
---------------------------
storage  display     value
variable name   type   format      label      variable label
------------------------------------------------------------------------
---------------------------
X1              float  %9.0g
X2              float  %9.0g
u               float  %9.0g
------------------------------------------------------------------------
---------------------------
Sorted by:
Note:  dataset has changed since last saved

. gen y1=HGI[1,1]*X1
. gen y2=HGI[2,1]*X1+HGI[2,2]*X2
. gen y3=HGI[3,1]*X1+HGI[3,2]*X2+HGI[3,3]*u

. corr y*,cov
(obs=50)

|       y1       y2       y3
-------------+---------------------------
y1 |        1
y2 |       .5        1
y3 |       .3 -9.1e-10        1

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of James Shaw
Sent: Tuesday, December 23, 2008 10:55 AM
To: statalist@hsphsun2.harvard.edu
Subject: st: Simulating multilevel data in Stata

Dear Statalist members,

I want to perform a simulation to show the inconsistency of the OLS and
random effects estimators when one of the regressors is correlated with
the unit-specific error component.  The specifics of the simulation are
as follows;

Y[i,t] (the outcome to be modeled) = b0 + b1*X1[i,t] + b2*X2[i,t] + u[i]
+ e[i,t]

i = 1,...,500 indexes subjects
t = 1,...,3 indexes time (repeated observations on subjects)

X1 and X2 are normally distributed random variables with arbitrary means
and variances u is a normally distributed subject-specific error
component with mean of 0 and arbitrary variance e is a normally
distributed random error component with mean of 0 and arbitrary variance

corr(X1,X2) = 0.5
corr(X1,u) = 0.3
corr(X2,u) = 0.0
corr(X1,e) = corr(X2,e) = corr(u,e) = 0.0

b0, b1, and b2 are parameters to be specified in the simulation

I have been unable to identify a method that will ensure that
corr(X1,u) equals the desired value.  I tried the following method in
which u was generated separately from X1 and X2 and cholesky
decomposition was applied to generate transformations of the three
random variables that would exhibit the desired correlations.
However, this yielded a non-zero correlation between X2 and u.

Method 1
***
drop _all
set obs 500
gen n = _n
gen u=invnorm(uniform())
expand 3
sort n
gen n2 = _n
gen t= (n2 - (n-1)*3)
drawnorm x1 x2 e
sort n
mkmat x1 x2 u e, matrix(X)
mat c =(1, .5, .3, 0 \ .5, 1, 0, 0 \ .3, 0, 1, 0 \ 0, 0, 0, 1) mat X2 =
X*cholesky(c)
***

A method that yielded somewhat better results involved generating X1,
X2, u, and e with a pre-specified correlation matrix and then collapsing
u so that it varied by subject only.  This provided the correct values
for corr(X1,X2) and corr(X2,u) but attenuated the correlation between X1
and u.  I presume that I could simply specify a higher value for
corr(X1,u) when generating the variables so that the desired value would
be achieved after u is collapsed.  However, this would not be the most
elegant solution.

Method 2
***
drop _all
set obs 500
mat c =(1, .5, .3, 0 \ .5, 1, 0, 0 \ .3, 0, 1, 0 \ 0, 0, 0, 1) gen n =
_n expand 3 sort n gen n2 = _n gen t= (n2 - (n-1)*3) drawnorm x1 x2 u e,
corr(c) sort n by n: egen u2 = mean(u)
***

Any suggestions or references would be appreciated.

Regards,

Jim

James W. Shaw, Ph.D., Pharm.D., M.P.H.
Assistant Professor
College of Pharmacy
University of Illinois at Chicago
833 South Wood Street, M/C 871, Room 252 Chicago, IL 60612
Tel.: 312-355-5666
Fax: 312-996-0868
Mobile Tel.: 215-852-3045
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```