st: Simulating multilevel data in Stata

 From "James Shaw" To statalist@hsphsun2.harvard.edu Subject st: Simulating multilevel data in Stata Date Tue, 23 Dec 2008 10:55:10 -0600

```Dear Statalist members,

I want to perform a simulation to show the inconsistency of the OLS
and random effects estimators when one of the regressors is correlated
with the unit-specific error component.  The specifics of the
simulation are as follows;

Y[i,t] (the outcome to be modeled) = b0 + b1*X1[i,t] + b2*X2[i,t] +
u[i] + e[i,t]

i = 1,...,500 indexes subjects
t = 1,...,3 indexes time (repeated observations on subjects)

X1 and X2 are normally distributed random variables with arbitrary
means and variances
u is a normally distributed subject-specific error component with mean
of 0 and arbitrary variance
e is a normally distributed random error component with mean of 0 and
arbitrary variance

corr(X1,X2) = 0.5
corr(X1,u) = 0.3
corr(X2,u) = 0.0
corr(X1,e) = corr(X2,e) = corr(u,e) = 0.0

b0, b1, and b2 are parameters to be specified in the simulation

I have been unable to identify a method that will ensure that
corr(X1,u) equals the desired value.  I tried the following method in
which u was generated separately from X1 and X2 and cholesky
decomposition was applied to generate transformations of the three
random variables that would exhibit the desired correlations.
However, this yielded a non-zero correlation between X2 and u.

Method 1
***
drop _all
set obs 500
gen n = _n
gen u=invnorm(uniform())
expand 3
sort n
gen n2 = _n
gen t= (n2 - (n-1)*3)
drawnorm x1 x2 e
sort n
mkmat x1 x2 u e, matrix(X)
mat c =(1, .5, .3, 0 \ .5, 1, 0, 0 \ .3, 0, 1, 0 \ 0, 0, 0, 1)
mat X2 = X*cholesky(c)
***

A method that yielded somewhat better results involved generating X1,
X2, u, and e with a pre-specified correlation matrix and then
collapsing u so that it varied by subject only.  This provided the
correct values for corr(X1,X2) and corr(X2,u) but attenuated the
correlation between X1 and u.  I presume that I could simply specify a
higher value for corr(X1,u) when generating the variables so that the
desired value would be achieved after u is collapsed.  However, this
would not be the most elegant solution.

Method 2
***
drop _all
set obs 500
mat c =(1, .5, .3, 0 \ .5, 1, 0, 0 \ .3, 0, 1, 0 \ 0, 0, 0, 1)
gen n = _n
expand 3
sort n
gen n2 = _n
gen t= (n2 - (n-1)*3)
drawnorm x1 x2 u e, corr(c)
sort n
by n: egen u2 = mean(u)
***

Any suggestions or references would be appreciated.

Regards,

Jim

James W. Shaw, Ph.D., Pharm.D., M.P.H.
Assistant Professor
College of Pharmacy
University of Illinois at Chicago
833 South Wood Street, M/C 871, Room 252
Chicago, IL 60612
Tel.: 312-355-5666
Fax: 312-996-0868
Mobile Tel.: 215-852-3045
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```