Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Generating correlated binary data for given ICC

From   Georgia Ntani <>
Subject   st: Generating correlated binary data for given ICC
Date   Fri, 30 Nov 2012 10:18:40 +0000

Dear Statalisters

I apologise for posting the same question for a second time. However, since I did not get any reply I thought it may be because of how I have set up my question and I thought I should simplify it.

I am trying to generate binary correlated data for fixed parameter β (effect of covariates on the outcome), number and size of clusters, and ICC. For this, I follow the algorithm suggested by Santos et al (Santos et al. Estimating adjusted prevalence ratio in clustered cross-sectional epidemiological data. BMC Medical research methodology). Below is my Stata syntax for this with corresponding comments.

/* Set up 10 clusters with 30 obs per cluster */

set obs 300

gen cluster=1 if _n<=30

forvalues i=2(1)10 {

replace cluster=`i' if _n<=`i'*30 & cluster==.


/* Level 2 binary explanatory variable x1 – equal number of clusters each category */

gen x1=1 if cluster<=5

replace x1=0 if x1==.

/* Level 1 continuous independent variable X2ij//from a Normal(0,1) distribution */

generate x2 = invnorm(uniform())

/* Generate error term. That will be a normal variable such that for given cluster j, uoj ~ N(0,sigma_u^2 ), where uoj and uoj' are independent for j ≠j'. The intraclass correlation coefficient (ICC) is defined as (sigma_u^2 )/(sigma_u^2 +(pi^2 /3)) */


/* From the formula above,if ICC=0.03 then sigma_u will be sqrt(0.01*_pi^2)/0.97 */

di sqrt(0.01*_pi^2)/0.97

set seed 54325821

clear all

corr2data u01 u02 u03 u04 u05 u06 u07 u08 u09 u010, n(30) ///

means(0 0 0 0 0 0 0 0 0 0) sds(.319 .319 .319 ///

.319 .319 .319 .319 .319 .319 .319)

gen sn=_n

reshape long u0 , i(sn) j(cluster)

drop sn

sort cluster u0

save randomerror.dta, replace


sort cluster

merge cluster using randomerror

drop _merge

/* Probability of the outcome from the random effects logistic model */

gen prob=exp(x1*0.47+x2*0.3665+u0)/(1+exp(x1*0.47+x2*0.3665+u0))

/* Binary outcome from the Bernoulli distribution with probability prob */

gen u=runiform()

gen y = (u > prob)

What I don’t understand is why when I run the above syntax, and then the command for multilevel logistic

xtlogit y i.x1 x2, i(cluster) quad(30) or

I get such a low sigma_u (approximately zero and thus a very low rho) when I should be getting sigma_u=0.319

Any help on this would be greatly appreciated.

Many thanks


*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index