If we change the order of cluster sampling and stratification when sampling
the population, would the svyset command be different?
|
Title
|
|
Using svyset for stratified multiple-stage designs
|
|
Author
|
Jeffrey Pitblado, StataCorp
|
|
Date
|
May 2006; updated July 2011
|
Suppose you are faced with analyzing data from the following survey
design:
The population was sampled by stratifying it first and then
randomly selecting several clusters for each stratum. Within each
cluster, subclusters were randomly selected, and then for each
subcluster individuals were randomly selected.
Your first question when analyzing survey data should always be:
How do I identify the sampling design using
svyset in Stata?
Starting in Stata 9, svyset has a syntax to deal
with multiple stages of clustered sampling.
Let’s make up some variable names to represent survey design
characteristics:
| pwt |
sampling weights
|
| strata1 |
stage 1 strata |
| su1 |
stage 1 sampling units (PSU) |
| fpc1 |
stage 1 finite population
correction |
| strata2 |
stage 2 strata |
| su2 |
stage 2 sampling units (SSU) |
| fpc2 |
stage 2 finite population correction |
... you get the idea.
Given the description above, the svyset command
should be structured as follows:
svyset su1 [pw=pwt], strata(strata1) fpc(fpc1) ///
|| su2, fpc(fpc2) || _n, fpc(fpc3)
(/// tells Stata to continue to the next line in ado- or do-files.)
Prior to Stata 9, where svyset accepted only the first-stage
design variables, one might assume that the svyset command
should be as follows:
svyset [pweight=pwt], fpc(fpc1) psu(su1) strata(strata1)
When using only the first-stage design characteristics, you must be aware
that specifying an FPC implies there was no sampling within the PSUs.
If this is not true, then specifying an FPC for the first stage will yield
negatively biased standard errors; that is, the standard error estimates will
be smaller than they should. In this case, we recommend you not
svyset an FPC.
If we remove the fpc() option, then
svyset [pweight=pwt], psu(su1) strata(strata1)
will produce appropriate variance estimates, even for multistage designs.
The previous assertion is also valid if you are using the modern syntax
for svyset, but, for some reason, you can only specify the first-stage
characteristics. For example, some datasets come only with information
on stratification and sampling units on the first stage, even if they
have been collected via a multistage design. If this is the case,
fpc() should not be used for the reasons explained above.
In a current Stata, you can specify the design variables for each stage,
using || to delimit the stages.
Now suppose the design involved cluster sampling first, and then each
cluster was stratified before the subclusters were sampled.
Here we stratified in the second stage but not the first, so we should have
a variable like strata2 instead of
strata1:
svyset su1 [pw=pwt], fpc(fpc1) ///
|| su2, strata(strata2) fpc(fpc2) || _n, fpc(fpc3)
If our design involved stratified cluster sampling in both the first and
second stages, the svyset command would be as
follows:
svyset su1 [pw=pwt], strata(strata1) fpc(fpc1) ///
|| su2, strata(strata2) fpc(fpc2) || _n, fpc(fpc3)
In a current Stata, you need to know from which stage a stratum variable
identifies the strata. See [SVY] svyset for
more examples of how to svyset multistage designs.
Prior to Stata 9, you would use the strata()
option only if your design had stratification in the first stage.
|