Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# st: Generating correlated binary data for given ICC

 From Georgia Ntani To statalist@hsphsun2.harvard.edu Subject st: Generating correlated binary data for given ICC Date Fri, 30 Nov 2012 10:18:40 +0000

```Dear Statalisters

```
I apologise for posting the same question for a second time. However, since I did not get any reply I thought it may be because of how I have set up my question and I thought I should simplify it.
```
```
I am trying to generate binary correlated data for fixed parameter β (effect of covariates on the outcome), number and size of clusters, and ICC. For this, I follow the algorithm suggested by Santos et al (Santos et al. Estimating adjusted prevalence ratio in clustered cross-sectional epidemiological data. BMC Medical research methodology). Below is my Stata syntax for this with corresponding comments.
```
/* Set up 10 clusters with 30 obs per cluster */

set obs 300

gen cluster=1 if _n<=30

forvalues i=2(1)10 {

replace cluster=`i' if _n<=`i'*30 & cluster==.

}

```
/* Level 2 binary explanatory variable x1 – equal number of clusters each category */
```
gen x1=1 if cluster<=5

replace x1=0 if x1==.

```
/* Level 1 continuous independent variable X2ij//from a Normal(0,1) distribution */
```
generate x2 = invnorm(uniform())

```
/* Generate error term. That will be a normal variable such that for given cluster j, uoj ~ N(0,sigma_u^2 ), where uoj and uoj' are independent for j ≠j'. The intraclass correlation coefficient (ICC) is defined as (sigma_u^2 )/(sigma_u^2 +(pi^2 /3)) */
```
preserve

```
/* From the formula above,if ICC=0.03 then sigma_u will be sqrt(0.01*_pi^2)/0.97 */
```
di sqrt(0.01*_pi^2)/0.97

set seed 54325821

clear all

corr2data u01 u02 u03 u04 u05 u06 u07 u08 u09 u010, n(30) ///

means(0 0 0 0 0 0 0 0 0 0) sds(.319 .319 .319 ///

.319 .319 .319 .319 .319 .319 .319)

gen sn=_n

reshape long u0 , i(sn) j(cluster)

drop sn

sort cluster u0

save randomerror.dta, replace

restore

sort cluster

merge cluster using randomerror

drop _merge

/* Probability of the outcome from the random effects logistic model */

gen prob=exp(x1*0.47+x2*0.3665+u0)/(1+exp(x1*0.47+x2*0.3665+u0))

/* Binary outcome from the Bernoulli distribution with probability prob */

gen u=runiform()

gen y = (u > prob)

```
What I don’t understand is why when I run the above syntax, and then the command for multilevel logistic
```
xtlogit y i.x1 x2, i(cluster) quad(30) or

```
I get such a low sigma_u (approximately zero and thus a very low rho) when I should be getting sigma_u=0.319
```
Any help on this would be greatly appreciated.

Many thanks

Georgia

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```