[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"James Shaw" <shawjw@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: Simulating multilevel data in Stata |

Date |
Tue, 23 Dec 2008 10:55:10 -0600 |

Dear Statalist members, I want to perform a simulation to show the inconsistency of the OLS and random effects estimators when one of the regressors is correlated with the unit-specific error component. The specifics of the simulation are as follows; Y[i,t] (the outcome to be modeled) = b0 + b1*X1[i,t] + b2*X2[i,t] + u[i] + e[i,t] i = 1,...,500 indexes subjects t = 1,...,3 indexes time (repeated observations on subjects) X1 and X2 are normally distributed random variables with arbitrary means and variances u is a normally distributed subject-specific error component with mean of 0 and arbitrary variance e is a normally distributed random error component with mean of 0 and arbitrary variance corr(X1,X2) = 0.5 corr(X1,u) = 0.3 corr(X2,u) = 0.0 corr(X1,e) = corr(X2,e) = corr(u,e) = 0.0 b0, b1, and b2 are parameters to be specified in the simulation I have been unable to identify a method that will ensure that corr(X1,u) equals the desired value. I tried the following method in which u was generated separately from X1 and X2 and cholesky decomposition was applied to generate transformations of the three random variables that would exhibit the desired correlations. However, this yielded a non-zero correlation between X2 and u. Method 1 *** drop _all set obs 500 gen n = _n gen u=invnorm(uniform()) expand 3 sort n gen n2 = _n gen t= (n2 - (n-1)*3) drawnorm x1 x2 e sort n mkmat x1 x2 u e, matrix(X) mat c =(1, .5, .3, 0 \ .5, 1, 0, 0 \ .3, 0, 1, 0 \ 0, 0, 0, 1) mat X2 = X*cholesky(c) *** A method that yielded somewhat better results involved generating X1, X2, u, and e with a pre-specified correlation matrix and then collapsing u so that it varied by subject only. This provided the correct values for corr(X1,X2) and corr(X2,u) but attenuated the correlation between X1 and u. I presume that I could simply specify a higher value for corr(X1,u) when generating the variables so that the desired value would be achieved after u is collapsed. However, this would not be the most elegant solution. Method 2 *** drop _all set obs 500 mat c =(1, .5, .3, 0 \ .5, 1, 0, 0 \ .3, 0, 1, 0 \ 0, 0, 0, 1) gen n = _n expand 3 sort n gen n2 = _n gen t= (n2 - (n-1)*3) drawnorm x1 x2 u e, corr(c) sort n by n: egen u2 = mean(u) *** Any suggestions or references would be appreciated. Regards, Jim James W. Shaw, Ph.D., Pharm.D., M.P.H. Assistant Professor Department of Pharmacy Administration College of Pharmacy University of Illinois at Chicago 833 South Wood Street, M/C 871, Room 252 Chicago, IL 60612 Tel.: 312-355-5666 Fax: 312-996-0868 Mobile Tel.: 215-852-3045 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Simulating multilevel data in Stata***From:*"Stas Kolenikov" <skolenik@gmail.com>

**st: RE: Simulating multilevel data in Stata***From:*"Feiveson, Alan H. (JSC-SK311)" <Alan.H.Feiveson@nasa.gov>

- Prev by Date:
**st: estout: displaying single regression results in more columns** - Next by Date:
**st: RE: Simulating multilevel data in Stata** - Previous by thread:
**st: estout: displaying single regression results in more columns** - Next by thread:
**st: RE: Simulating multilevel data in Stata** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |