Stata | FAQ: Using svyset for stratified multiple-stage designs

Home / Resources & support / FAQs / Using svyset for stratified multiple-stage designs

If we change the order of cluster sampling and stratification when sampling the population, would the svyset command be different?

Title		Using svyset for stratified multiple-stage designs
Author		Jeffrey Pitblado, StataCorp

Suppose you are faced with analyzing data from the following survey design:

The population was sampled by stratifying it first and then randomly selecting several clusters for each stratum. Within each cluster, subclusters were randomly selected, and then for each subcluster individuals were randomly selected.

Your first question when analyzing survey data should always be:

How do I identify the sampling design using svyset in Stata?

Starting in Stata 9, svyset has a syntax to deal with multiple stages of clustered sampling.

Let’s make up some variable names to represent survey design characteristics:

pwt	sampling weights
strata1	stage 1 strata
su1	stage 1 sampling units (PSU)
fpc1	stage 1 finite population correction
strata2	stage 2 strata
su2	stage 2 sampling units (SSU)
fpc2	stage 2 finite population correction

... you get the idea.

Given the description above, the svyset command should be structured as follows:

svyset su1 [pw=pwt], strata(strata1) fpc(fpc1)		///
	|| su2, fpc(fpc2) || _n, fpc(fpc3)

(/// tells Stata to continue to the next line in ado- or do-files.)

Prior to Stata 9, where svyset accepted only the first-stage design variables, one might assume that the svyset command should be as follows:

svyset [pweight=pwt], fpc(fpc1) psu(su1) strata(strata1)

When using only the first-stage design characteristics, you must be aware that specifying an FPC implies there was no sampling within the PSUs. If this is not true, then specifying an FPC for the first stage will yield negatively biased standard errors; that is, the standard error estimates will be smaller than they should. In this case, we recommend you not svyset an FPC.

If we remove the fpc() option, then

svyset [pweight=pwt], psu(su1) strata(strata1)

will produce appropriate variance estimates, even for multistage designs.

The previous assertion is also valid if you are using the modern syntax for svyset, but, for some reason, you can only specify the first-stage characteristics. For example, some datasets come only with information on stratification and sampling units on the first stage, even if they have been collected via a multistage design. If this is the case, fpc() should not be used for the reasons explained above.

In a current Stata, you can specify the design variables for each stage, using || to delimit the stages.

Now suppose the design involved cluster sampling first, and then each cluster was stratified before the subclusters were sampled. Here we stratified in the second stage but not the first, so we should have a variable like strata2 instead of strata1:

svyset su1 [pw=pwt], fpc(fpc1)				///
	|| su2, strata(strata2) fpc(fpc2) || _n, fpc(fpc3)

If our design involved stratified cluster sampling in both the first and second stages, the svyset command would be as follows:

svyset su1 [pw=pwt], strata(strata1) fpc(fpc1)		///
	|| su2, strata(strata2) fpc(fpc2) || _n, fpc(fpc3)

In a current Stata, you need to know from which stage a stratum variable identifies the strata. See [SVY] svyset for more examples of how to svyset multistage designs.

Prior to Stata 9, you would use the strata() option only if your design had stratification in the first stage.

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

If we change the order of cluster sampling and stratification when sampling the population, would the svyset command be different?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

If we change the order of cluster sampling and stratification when sampling the population, would the svyset command be different?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies