If we change the order of cluster sampling and stratification when sampling
the population, would the svyset command be different?
|
Title
|
|
Using svyset for stratified multiple-stage designs
|
|
Author
|
Jeffrey Pitblado, StataCorp
|
|
Date
|
May 2006
|
Suppose that you are faced with analyzing data from the following survey
design:
The population was sampled by stratifying it first and then
randomly selecting several clusters for each stratum. Within each
cluster, subclusters were randomly selected, and then for each
subcluster individuals were randomly selected.
Your first question when analyzing survey data should always be:
How do I identify the sampling design using
svyset in Stata?
Starting in Stata 9, svyset has a syntax to deal
with multiple stages of clustered sampling.
Let’s make up some variable names to represent survey design
characteristics:
| pwt |
sampling weights
|
| strata1 |
stage 1 strata |
| su1 |
stage 1 sampling units (PSU) |
| fpc1 |
stage 1 finite population
correction |
| strata2 |
stage 2 strata |
| su2 |
stage 2 sampling units (PSU) |
| fpc2 |
stage 2 finite population correction |
... you get the idea.
Given the description above, the svyset command
should be structured as follows:
svyset su1 [pw=pwt], strata(strata1) fpc(fpc1) ///
|| su2, fpc(fpc2) || _n, fpc(fpc3)
(/// tells Stata to continue to the next line in ado- or do-files.)
Prior to Stata 9, where svyset accepted only the
first-stage design variables, one might assume that the
svyset command should be as follows:
svyset [pweight=pwt], fpc(fpc1) psu(su1) strata(strata1)
However, the fpc() option should not be used
with a two-stage (or any multistage) design because this specification
assumes no sampling within PSUs. This assumption will always result in a
negative bias in the variance estimate; thus, the variance estimate will be
smaller than it should. Simply removing the
fpc() option, as in
svyset [pweight=pwt], psu(su1) strata(strata1)
will produce appropriate variance estimates, even for multistage designs.
In a current Stata, you can specify the design variables for each stage,
using || to delimit the stages.
Now suppose that the design involved cluster sampling first, and then each
cluster was stratified before the subclusters were sampled.
Here we stratified in the second stage but not the first, so we should have
a variable like strata2 instead of
strata1:
svyset su1 [pw=pwt], fpc(fpc1) ///
|| su2, strata(strata2) fpc(fpc2) || _n, fpc(fpc3)
If our design involved stratified cluster sampling in both the first and
second stages, the svyset command would be as
follows:
svyset su1 [pw=pwt], strata(strata1) fpc(fpc1) ///
|| su2, strata(strata2) fpc(fpc2) || _n, fpc(fpc3)
In a current Stata, you need to know from which stage a stratum variable
identifies the strata. See [SVY] svyset for
more examples of how to svyset multistage designs.
Prior to Stata 9, you would use the strata()
option only if your design had stratification in the first stage.
|