Levy and Lemeshow use Stata examples in their text Sampling of
Populations. I am unclear whether in my situation the code they supply
is applicable.
For a health risk survey I am planning to sample the population of
students in an entire college by sampling classrooms (clusters). All
students in each selected classroom would fill out the form. Students
who fill out the form are given a token so that if another of their
classes is sampled they would not submit another form. (In some large
freshman classes there could be quite a lot of overlap.)
This appears to be a simple one-stage cluster sample. The clusters are
the classrooms (minus those already surveyed) and their identifiers are
are entered as the PSU. The pweight is the total number of classes
divided by the number of sampled classes, entered for each case. The
FPC would be the the number of total classes in the sampling frame.
My concern is that the standard errors would be very large. Levy and
Lemeshow suggest sampling proportional to size as a way of dealing with
large standard errors, but only discuss cases where the same sample
size is identical in each cluster (which does not fit my situation).
So my first question is whether "sampling proportional to size" can be
(or needs to be) adapted to this situation.
Another way of dealing with large standard errors would be to stratify
by class size--but because of the duplication of students in different
classes I do not have an accurate picture of the "real" class size
(minus duplicates).
Any suggestions would be welcome, especially if you have dealt with the
same situation (which I think must be common).
Dan Chandler
******
Dan Chandler, Ph.D.
436 Old Wagon Road
Trinidad, CA 95570
707 677 0895 (fax or phone)