Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Georgia Ntani" <gn@mrc.soton.ac.uk> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | st: Generate correlated binary data for fixed rho |
Date | Fri, 23 Nov 2012 17:56:32 +0000 |
Dear Statalisters I am trying to generate binary correlated data for fixed parameter b (effect of covariates on the outcome), number and size of clusters, and ICC. For this, I follow the algorithm suggested by Santos et al (Santos et al. Estimating adjusted prevalence ratio in clustered cross-sectional epidemiological data. BMC Medical research methodology). Below, is what I am trying to do in algorithm steps. 1. Set 10 clusters with 30 observations per cluster 2. Generate a dichotomous independent variable X1j. This will be a group-level variable (X1j=1 for the first 5 clusters and 0 for the last 5 clusters) 3. Generate a continuous independent variable X2ij from a Normal(0,1) distribution. That will be an individual level variable 4. Generate a normal variable u0, such that for given cluster j, uoj~N(0,sigma_u^2), where uoj and uoj' are independent for j different from j'. The intraclass correlation coefficient (ICC) is defined as (sigma_u^2 )/(sigma_u^2 +(pi^2 /3)) 5. Generate probability of the outcome that will be from a random effects logistic model Pij=exp(X*b+u0)/(1+exp(X*b+u0)) 6. Generate a binary outcome Yij with probability Pij The Stata commands that I used to run the above are set obs 300 gen cluster=1 if _n<=30 forvalues i=2(1)10 { replace cluster=`i' if _n<=`i'*30 & cluster==. } gen x1=1 if cluster<=5 replace x1=0 if x1==. generate x2 = invnorm(uniform()) preserve /* From the formula above,if ICC=0.03 then sigma_u will be /// sqrt(0.01*_pi^2)/0.97 */ di sqrt(0.01*_pi^2)/0.97 set seed 54325821 clear all corr2data u01 u02 u03 u04 u05 u06 u07 u08 u09 u010, n(30) /// means(0 0 0 0 0 0 0 0 0 0) sds(.319 .319 .319 /// .319 .319 .319 .319 .319 .319 .319) gen sn=_n reshape long u0 , i(sn) j(cluster) drop sn sort cluster u0 save randomerror.dta, replace restore sort cluster merge cluster using randomerror drop _merge gen prob=exp(x1*0.47+x2*0.3665+u0)/(1+exp(x1*0.47+x2*0.3665+u0)) gen u=runiform() gen y = (u > prob) The problem now comes when I run the command - xtlogit y i.x1 x2, i(cluster) quad(30) or-- What I get is the following Random-effects logistic regression Number of obs = 300 Group variable: cluster Number of groups = 10 Random effects u_i ~ Gaussian Obs per group: min = 30 avg = 30.0 max = 30 Wald chi2(2) = 14.95 Log likelihood = -199.56666 Prob > chi2 = 0.0006 ------------------------------------------------------------------------------ y | OR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.x1 | .5943426 .1412417 -2.19 0.029 .373039 .9469334 x2 | .6737118 .081714 -3.26 0.001 .5311688 .8545073 -------------+---------------------------------------------------------------- /lnsig2u | -15.53268 280.1426 -564.6021 533.5368 -------------+---------------------------------------------------------------- sigma_u | .0004238 .0593568 2.5e-123 7.2e+115 rho | 5.46e-08 .0000153 1.9e-246 1 ------------------------------------------------------------------------------ Likelihood-ratio test of rho=0: chibar2(01) = 0.00 Prob >= chibar2 = 1.000 My question is why I am getting a sigma_u of 0.0004? Aren't I supposed to be getting sigma_u of 0.31 from the defined u0 (random error) above? Any help on this would be highly appreciated Many thanks Georgia * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/