Dear statalist, I have survey data of patients clustered within a sample of hospitals. If I use: svyset [pweight=weight], psu(hospital) clear svymean outcome this sometimes gives me a SE less than if I had ignored the clustering in the sample: svyset [pweight=weight], clear svymean outcome I suppose this is because there is rather less variation between cluster means than within clusters. But shouldn't an allowance for clustering in the sample increase the SE, not reduce it? I understand the formula for the sampling variance from a cluster sample is var(C) = var(R)*[1+(N-1)*rho] where var(C) is the variance from the cluster sample of equal size clusters, var(R) is the variance from a simple random sample of the same size, N is the size of a cluster, and rho is the intracluster correlation which must be between 0 and 1. So var(C) must be greater than var(R). For some outcomes using the psu( ) option increases the SE and for others (as mentioned above) it decreases the SE. The average change for all the outcomes is probably a small increase. Should I use the psu( ) option only if it increases SE's? Or should I always use the psu( ) option and just accept that this will sometimes decrease SE's? Any advice would be much appreciated. Tom

