# Re: st: Simulating multilevel data in Stata

 From "Stas Kolenikov" To statalist@hsphsun2.harvard.edu Subject Re: st: Simulating multilevel data in Stata Date Tue, 23 Dec 2008 15:28:38 -0600

```Here's what I would suggest.

X1, X2 and u have a multivariate normal distribution, say (unless you
want to study the effects of non-normality, as well). Let us write the
covariance matrix as Sigma, and block it as Sigma_11, Sigma_12 and
Sigma_22 so that the upper block corresponds to the time varying X
variables, and the lower block, to the panel effect u:

mata:
Sigma = (1, 0.5, 0.3 \ 0.5, 1, 0 \ 0.3, 0, 1)
Sigma11 = Sigma[|(1,1) \ (2,2)|]
Sigma21 = Sigma[|(3,1) \ (3,2)|]
Sigma22 = Sigma[3,3]

// 1. Sample u from say standard normal distribution:
u = rnormal(1,1,0,1)

// 2. Figure out the conditional distribution X1, X2 | u: the mean is
mean_u = invsym(Sigma11) * Sigma21' * u
Sigma11_u = Sigma11 - Sigma21' * Sigma21 / Sigma22
chS = cholesky(Sigma11_u)

// 3. Draw normal X1 and X2
X = rnormal(1,2,0,1)
X = X*chS

// I hope I did not mess up with all the transposes and such :))

// Now, this will generate a triplet of u, X1, X2.
// You can repeat step 3 a bunch of times to get T values,
// and that's doable simply by replacing the lines to
X = rnormal(T,2,0,1)
X = X*chS

// You will also need to propagate the values of u:
U = u*J(3,1,1)

// Finally you can add (standard normal) e on top of everything:
E = rnormal(3,1,0,1)

// Now, a simulated panel will look like
X, U, E

end
// of Mata

You now will have to repeat the process N times, and post everything
back to Stata using st_view(). Most of this stuff is of course doable
in strict Stata, as well. The critical steps are matrix blocking and
Cholesky decomposition. You can just do those for your fixed
covariance matrix of interest, and hard code the numbers into some
-generate- and -replace- statements.

On Tue, Dec 23, 2008 at 10:55 AM, James Shaw <shawjw@gmail.com> wrote:
> Dear Statalist members,
>
> I want to perform a simulation to show the inconsistency of the OLS
> and random effects estimators when one of the regressors is correlated
> with the unit-specific error component.  The specifics of the
> simulation are as follows;
>
> Y[i,t] (the outcome to be modeled) = b0 + b1*X1[i,t] + b2*X2[i,t] +
> u[i] + e[i,t]
>
> i = 1,...,500 indexes subjects
> t = 1,...,3 indexes time (repeated observations on subjects)
>
> X1 and X2 are normally distributed random variables with arbitrary
> means and variances
> u is a normally distributed subject-specific error component with mean
> of 0 and arbitrary variance
> e is a normally distributed random error component with mean of 0 and
> arbitrary variance
>
> corr(X1,X2) = 0.5
> corr(X1,u) = 0.3
> corr(X2,u) = 0.0
> corr(X1,e) = corr(X2,e) = corr(u,e) = 0.0
>
> b0, b1, and b2 are parameters to be specified in the simulation
>
>
> I have been unable to identify a method that will ensure that
> corr(X1,u) equals the desired value.  I tried the following method in
> which u was generated separately from X1 and X2 and cholesky
> decomposition was applied to generate transformations of the three
> random variables that would exhibit the desired correlations.
> However, this yielded a non-zero correlation between X2 and u.
>
> Method 1
> ***
> drop _all
> set obs 500
> gen n = _n
> gen u=invnorm(uniform())
> expand 3
> sort n
> gen n2 = _n
> gen t= (n2 - (n-1)*3)
> drawnorm x1 x2 e
> sort n
> mkmat x1 x2 u e, matrix(X)
> mat c =(1, .5, .3, 0 \ .5, 1, 0, 0 \ .3, 0, 1, 0 \ 0, 0, 0, 1)
> mat X2 = X*cholesky(c)
> ***
>
> A method that yielded somewhat better results involved generating X1,
> X2, u, and e with a pre-specified correlation matrix and then
> collapsing u so that it varied by subject only.  This provided the
> correct values for corr(X1,X2) and corr(X2,u) but attenuated the
> correlation between X1 and u.  I presume that I could simply specify a
> higher value for corr(X1,u) when generating the variables so that the
> desired value would be achieved after u is collapsed.  However, this
> would not be the most elegant solution.
>
> Method 2
> ***
> drop _all
> set obs 500
> mat c =(1, .5, .3, 0 \ .5, 1, 0, 0 \ .3, 0, 1, 0 \ 0, 0, 0, 1)
> gen n = _n
> expand 3
> sort n
> gen n2 = _n
> gen t= (n2 - (n-1)*3)
> drawnorm x1 x2 u e, corr(c)
> sort n
> by n: egen u2 = mean(u)
> ***
>
> Any suggestions or references would be appreciated.
>
> Regards,
>
> Jim
>
>
> James W. Shaw, Ph.D., Pharm.D., M.P.H.
> Assistant Professor
> College of Pharmacy
> University of Illinois at Chicago
> 833 South Wood Street, M/C 871, Room 252
> Chicago, IL 60612
> Tel.: 312-355-5666
> Fax: 312-996-0868
> Mobile Tel.: 215-852-3045
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```