Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Simulating multilevel data in Stata


From   "Feiveson, Alan H. (JSC-SK311)" <Alan.H.Feiveson@nasa.gov>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Simulating multilevel data in Stata
Date   Tue, 23 Dec 2008 11:15:07 -0600

Jim - 

If I understand your question, you want to "fix" your randomly generated
X1, X2 and u so that their sample covariance matrix exactly equals the
one you want. Here's one way to do this (see below.

Al Feiveson


. matrix V = I(3)
. matrix V[1,2]=.5
. matrix V[2,1]=.5
. matrix V[1,3]=.3
. matrix V[3,1]=.3

. matrix list V

symmetric V[3,3]
    c1  c2  c3
r1   1
r2  .5   1
r3  .3   0   1

. set obs 50
obs was 0, now 50

. drawnorm X1 X2 u,cov(V)

. corr X1 X2 u,cov
(obs=50)

             |       X1       X2        u
-------------+---------------------------
          X1 |  1.01209
          X2 |  .467516  .953286
           u |  .571851  .073683  1.12376

. matrix accum A = X1 X2 u,dev noc
(obs=50)

. matrix A=(1/49)*A

. matrix list A

symmetric A[3,3]
           X1         X2          u
X1  1.0120898
X2  .46751642  .95328608
 u   .5718511  .07368264   1.123759

. matrix H=cholesky(V)
. matrix G=cholesky(A)
. matrix GI=inv(G)

. matrix HGI=H*GI

. matrix list HGI

HGI[3,3]
            X1          X2           u
r1   .99400935           0           0
r2   .03111956   1.0085584           0
r3  -.34919894   .07784363    1.082162

. des

Contains data
  obs:            50                          
 vars:             3                          
 size:           800 (99.9% of memory free)
------------------------------------------------------------------------
---------------------------
              storage  display     value
variable name   type   format      label      variable label
------------------------------------------------------------------------
---------------------------
X1              float  %9.0g                  
X2              float  %9.0g                  
u               float  %9.0g                  
------------------------------------------------------------------------
---------------------------
Sorted by:  
     Note:  dataset has changed since last saved

. gen y1=HGI[1,1]*X1
. gen y2=HGI[2,1]*X1+HGI[2,2]*X2
. gen y3=HGI[3,1]*X1+HGI[3,2]*X2+HGI[3,3]*u

. corr y*,cov
(obs=50)

             |       y1       y2       y3
-------------+---------------------------
          y1 |        1
          y2 |       .5        1
          y3 |       .3 -9.1e-10        1


  

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of James Shaw
Sent: Tuesday, December 23, 2008 10:55 AM
To: statalist@hsphsun2.harvard.edu
Subject: st: Simulating multilevel data in Stata

Dear Statalist members,

I want to perform a simulation to show the inconsistency of the OLS and
random effects estimators when one of the regressors is correlated with
the unit-specific error component.  The specifics of the simulation are
as follows;

Y[i,t] (the outcome to be modeled) = b0 + b1*X1[i,t] + b2*X2[i,t] + u[i]
+ e[i,t]

i = 1,...,500 indexes subjects
t = 1,...,3 indexes time (repeated observations on subjects)

X1 and X2 are normally distributed random variables with arbitrary means
and variances u is a normally distributed subject-specific error
component with mean of 0 and arbitrary variance e is a normally
distributed random error component with mean of 0 and arbitrary variance

corr(X1,X2) = 0.5
corr(X1,u) = 0.3
corr(X2,u) = 0.0
corr(X1,e) = corr(X2,e) = corr(u,e) = 0.0

b0, b1, and b2 are parameters to be specified in the simulation


I have been unable to identify a method that will ensure that
corr(X1,u) equals the desired value.  I tried the following method in
which u was generated separately from X1 and X2 and cholesky
decomposition was applied to generate transformations of the three
random variables that would exhibit the desired correlations.
However, this yielded a non-zero correlation between X2 and u.

Method 1
***
drop _all
set obs 500
gen n = _n
gen u=invnorm(uniform())
expand 3
sort n
gen n2 = _n
gen t= (n2 - (n-1)*3)
drawnorm x1 x2 e
sort n
mkmat x1 x2 u e, matrix(X)
mat c =(1, .5, .3, 0 \ .5, 1, 0, 0 \ .3, 0, 1, 0 \ 0, 0, 0, 1) mat X2 =
X*cholesky(c)
***

A method that yielded somewhat better results involved generating X1,
X2, u, and e with a pre-specified correlation matrix and then collapsing
u so that it varied by subject only.  This provided the correct values
for corr(X1,X2) and corr(X2,u) but attenuated the correlation between X1
and u.  I presume that I could simply specify a higher value for
corr(X1,u) when generating the variables so that the desired value would
be achieved after u is collapsed.  However, this would not be the most
elegant solution.

Method 2
***
drop _all
set obs 500
mat c =(1, .5, .3, 0 \ .5, 1, 0, 0 \ .3, 0, 1, 0 \ 0, 0, 0, 1) gen n =
_n expand 3 sort n gen n2 = _n gen t= (n2 - (n-1)*3) drawnorm x1 x2 u e,
corr(c) sort n by n: egen u2 = mean(u)
***

Any suggestions or references would be appreciated.

Regards,

Jim


James W. Shaw, Ph.D., Pharm.D., M.P.H.
Assistant Professor
Department of Pharmacy Administration
College of Pharmacy
University of Illinois at Chicago
833 South Wood Street, M/C 871, Room 252 Chicago, IL 60612
Tel.: 312-355-5666
Fax: 312-996-0868
Mobile Tel.: 215-852-3045
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index