Everyone is very concerned about correcting s.e.'s for intraclass correlation, but the reality is sometimes the intraclass correlation is not present. Small positive values for rho may just be random. Given that, small negative values for rho are also possible. Negative values imply that people within a cluster (or PSU) are more different from each other than they would have been had you sampled randomly (i.e., not by cluster). Intuitively this makes no sense. If rho=0, the formula reduces to the original variance. If rho>0, then the variance, and therefore s.e., will be larger. If rho<0, the variance will be smaller. This is probably what is happening in your case. A conservative solution is to use the corrected s.e. when it is larger, but use the original s.e. when correcting makes it smaller. If others have a better solution, would love to hear it.

CDSC - Nichols, Tom wrote:

Dear statalist,

I have survey data of patients clustered within a sample of hospitals.

If I use:

svyset [pweight=weight], psu(hospital) clear

svymean outcome

this sometimes gives me a SE less than if I had ignored the clustering in

the sample:

svyset [pweight=weight], clear

svymean outcome

I suppose this is because there is rather less variation between cluster

means than within clusters.

But shouldn't an allowance for clustering in the sample increase the SE, not

reduce it?

I understand the formula for the sampling variance from a cluster sample is

var(C) = var(R)*[1+(N-1)*rho]

where var(C) is the variance from the cluster sample of equal size clusters,

var(R) is the variance from a simple random sample of the same size, N is the size of a cluster, and

rho is the intracluster correlation which must be between 0 and 1.

So var(C) must be greater than var(R).

For some outcomes using the psu( ) option increases the SE and for others

(as mentioned above) it decreases the SE.

The average change for all the outcomes is probably a small increase.

Should I use the psu( ) option only if it increases SE's?

Or should I always use the psu( ) option and just accept that this will

sometimes decrease SE's?

Any advice would be much appreciated.

Tom

