Stata
Products Purchase Support Company
Search
   >> Home >> Resources & support >> FAQs >> Using svyset for stratified multiple-stage designs

If we change the order of cluster sampling and stratification when sampling the population, would the svyset command be different?

Title   Using svyset for stratified multiple-stage designs
Author Jeffrey Pitblado, StataCorp
Date May 2006

Suppose that you are faced with analyzing data from the following survey design:

The population was sampled by stratifying it first and then randomly selecting several clusters for each stratum. Within each cluster, subclusters were randomly selected, and then for each subcluster individuals were randomly selected.

Your first question when analyzing survey data should always be:

How do I identify the sampling design using svyset in Stata?

Starting in Stata 9, svyset has a syntax to deal with multiple stages of clustered sampling.

Let’s make up some variable names to represent survey design characteristics:

pwt sampling weights
strata1 stage 1 strata
su1 stage 1 sampling units (PSU)
fpc1 stage 1 finite population correction
strata2 stage 2 strata
su2 stage 2 sampling units (PSU)
fpc2 stage 2 finite population correction

... you get the idea.

Given the description above, the svyset command should be structured as follows:

svyset su1 [pw=pwt], strata(strata1) fpc(fpc1)		///
	|| su2, fpc(fpc2) || _n, fpc(fpc3)
(/// tells Stata to continue to the next line in ado- or do-files.)

Prior to Stata 9, where svyset accepted only the first-stage design variables, one might assume that the svyset command should be as follows:

svyset [pweight=pwt], fpc(fpc1) psu(su1) strata(strata1)

However, the fpc() option should not be used with a two-stage (or any multistage) design because this specification assumes no sampling within PSUs. This assumption will always result in a negative bias in the variance estimate; thus, the variance estimate will be smaller than it should. Simply removing the fpc() option, as in

svyset [pweight=pwt], psu(su1) strata(strata1)

will produce appropriate variance estimates, even for multistage designs.

In a current Stata, you can specify the design variables for each stage, using || to delimit the stages.

Now suppose that the design involved cluster sampling first, and then each cluster was stratified before the subclusters were sampled.

Here we stratified in the second stage but not the first, so we should have a variable like strata2 instead of strata1:

svyset su1 [pw=pwt], fpc(fpc1)				///
	|| su2, strata(strata2) fpc(fpc2) || _n, fpc(fpc3)

If our design involved stratified cluster sampling in both the first and second stages, the svyset command would be as follows:

svyset su1 [pw=pwt], strata(strata1) fpc(fpc1)		///
	|| su2, strata(strata2) fpc(fpc2) || _n, fpc(fpc3)

In a current Stata, you need to know from which stage a stratum variable identifies the strata. See [SVY] svyset for more examples of how to svyset multistage designs.

Prior to Stata 9, you would use the strata() option only if your design had stratification in the first stage.

FAQs
What's new?
Statistics
Data management
Graphics
Programming Stata
Mata
Resources
Internet capabilities
Stata for Windows
Stata for Unix
Stata for Macintosh
Technical support
Resources & support
FAQs
Technical support
NetCourses
Short courses
Users Group meetings
Statalist
Links
Software updates
Software archives
Customer service
Manuals & supplements
Stata Journal
STB
Stata News
Stata Automation
Plugins

Site overview
Products
Resources & support
Company
Site index

© Copyright 1996–2008 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index