Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Generating correlated binary data for given ICC

From	Georgia Ntani <[email protected]>
To	[email protected]
Subject	st: Generating correlated binary data for given ICC
Date	Fri, 30 Nov 2012 10:18:40 +0000

Dear Statalisters

I apologise for posting the same question for a second time. However,since I did not get any reply I thought it may be because of how I haveset up my question and I thought I should simplify it.

I am trying to generate binary correlated data for fixed parameter β(effect of covariates on the outcome), number and size of clusters, andICC. For this, I follow the algorithm suggested by Santos et al (Santoset al. Estimating adjusted prevalence ratio in clustered cross-sectionalepidemiological data. BMC Medical research methodology). Below is myStata syntax for this with corresponding comments.


/* Set up 10 clusters with 30 obs per cluster */

set obs 300

gen cluster=1 if _n<=30

forvalues i=2(1)10 {

replace cluster=`i' if _n<=`i'*30 & cluster==.

}

/* Level 2 binary explanatory variable x1 – equal number of clusterseach category */


gen x1=1 if cluster<=5

replace x1=0 if x1==.

/* Level 1 continuous independent variable X2ij//from a Normal(0,1)distribution */


generate x2 = invnorm(uniform())

/* Generate error term. That will be a normal variable such that forgiven cluster j, uoj ~ N(0,sigma_u^2 ), where uoj and uoj' areindependent for j ≠j'. The intraclass correlation coefficient (ICC) isdefined as (sigma_u^2 )/(sigma_u^2 +(pi^2 /3)) */


preserve

/* From the formula above,if ICC=0.03 then sigma_u will besqrt(0.01*_pi^2)/0.97 */


di sqrt(0.01*_pi^2)/0.97

set seed 54325821

clear all

corr2data u01 u02 u03 u04 u05 u06 u07 u08 u09 u010, n(30) ///

means(0 0 0 0 0 0 0 0 0 0) sds(.319 .319 .319 ///

.319 .319 .319 .319 .319 .319 .319)

gen sn=_n

reshape long u0 , i(sn) j(cluster)

drop sn

sort cluster u0

save randomerror.dta, replace

restore

sort cluster

merge cluster using randomerror

drop _merge

/* Probability of the outcome from the random effects logistic model */

gen prob=exp(x1*0.47+x2*0.3665+u0)/(1+exp(x1*0.47+x2*0.3665+u0))

/* Binary outcome from the Bernoulli distribution with probability prob */

gen u=runiform()

gen y = (u > prob)

What I don’t understand is why when I run the above syntax, and then thecommand for multilevel logistic


xtlogit y i.x1 x2, i(cluster) quad(30) or

I get such a low sigma_u (approximately zero and thus a very low rho)when I should be getting sigma_u=0.319


Any help on this would be greatly appreciated.

Many thanks

Georgia


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: question simultaneous probit equations
Next by Date: Re: st: question simultaneous probit equations
Previous by thread: st: smooth quntile lines in graph
Next by thread: st: Problems running spost9_ado on Stata 10 with _ms_omit_info
Index(es):
- Date
- Thread