Stata Release 9: Survey statistics

Home / Products / Stata 9 / Survey statistics

This page contains only historical information and is not about the current release of Stata. Please see our Stata 19 page for information on the current version of Stata.

Survey statistics

With the addition of balanced and repeated replications (BRR) and the addition of survey jackknife, Stata is now the only full-featured statistical package to directly support all three major variance estimators for survey and correlated data: BRR, jackknife, and cluster-based linearization.
Complete support is now included for multistage designs and for poststratification. To handle multistage designs, you specify the design when you svyset your data. To set a two-stage cluster design on city and schools within cities, stratifying on state, you set

. svyset city [pw=wgt], strata(state) fpc(ncities) || school, fpc(nschools)
If you also wanted to poststratify on gender, you would add poststrata(gender) postweight(ngender). After this svyset, all Stata’s survey estimators will properly account for your design.

Here are all the details.

A new, unified syntax is used for declaring the design of survey data and for fitting models. For an overview of all survey facilities, see [SVY] survey.

All the old syntax continues to work under version control, although the survey estimation commands do not even require that, but if you use old syntax, the new features will not be available.

Existing command svyset for declaring the survey design has new syntax that supports a host of new features in Stata’s survey-analysis facilities:

BRR and jackknife variance estimators have been added to the previously available linearization variance estimator. Moreover, use of BRR or jackknife (or linearization) can now be specified when you svyset or at estimation time.

Multistage designs can now be declared, and they may have primary, secondary, and lower-stage sampling units. The linearization variance estimator takes complete advantage of the information in multistage designs.

Stratification is now allowed in all stages, making variance estimates more efficient wherever stratification can be exploited.

Poststratification is now available and, like stratification, also makes variance estimates more efficient. Poststratification adjusts weights, improves variance estimates, and accounts for biases when demographic or other groupings are known.

Finite-population corrections are now allowed in all stages.

Sampling weights are handled under all three variance estimators.

For details, see [SVY] svyset. The previous svyset syntax continues to work under version control.

New prefix command svy: is how you tell estimators that you have survey data. You no longer type svyregress; you type svy: regress. This is not just a matter of style; svy really is a prefix command, and in fact, you can even use it as a prefix on estimation commands you write. In addition, svy: provides a standard, unified syntax for accessing Stata’s survey features, and svy: is easy to use because it automatically applies everything you have previously svyset, including the design.

The following estimators can be used with svy: prefix:

Descriptive statistics

svy: mean Population and subpopulation means

svy: proportion Population and subpopulation proportions

svy: ratio Population and subpopulation ratios

svy: total Population and subpopulation totals

svy: tabulate oneway One-way tables for survey data

svy: tabulate twoway Two-way tables for survey data

Regression models

svy: regress Linear regression

svy: ivreg Instrumental variables regression

svy: intreg Interval and censored regression

svy: logistic Logistic regression, reporting odds ratios

svy: logit Logistic regression, reporting coefficients

svy: probit Probit regression

svy: mlogit Multinomial logistic regression

svy: ologit Ordered logistic regression

svy: oprobit Ordered probit models

svy: poisson Poisson regression

svy: nbreg Negative binomial regression

svy: gnbreg Generalized negative binomial regression

svy: heckman Heckman selection model

svy: heckprob Probit estimation with selection

Previously existing survey-estimation commands, such as svyregress, svymean, and svypoisson, continue to work as they did before, but only if your survey design is declared using version 8: svyset or if you are working with an old Stata 8 dataset. For a mapping from old estimation commands to the new syntax, see svy8. (The new prefix svy: works with datasets that were svyset under an earlier release of Stata.)

In addition to the three variance estimators and support for multistage sampling, the new svy: prefix provides other enhancements, including

Option subpop() allows more flexible selection of subpopulations, meaning that more general if conditions are now allowed.

Strata with only one sampling unit (sometimes called singleton PSUs) are now handled better—the coefficients are now reported, but with missing standard errors. svydes can now be used to find and describe these strata; see [SVY] svydes.

With BRR variance estimation, a Hadamard matrix can be used in place of BRR weights, and Fay’s adjustment may be specified; see [SVY] brr_options.

New command svy: proportion replaces svyprop. (By the way, new command proportion can be used without the svy: prefix; see [R] proportion.) Unlike svyprop, svy: proportion is an estimation command and computes a full covariance matrix for all the estimated proportions, allowing postestimation features, such as tests of linear and nonlinear combinations of proportions (test and testnl) or creation of linear and nonlinear combinations with confidence intervals (lincom and nlcom).

New commands ratio, total, and mean, used with the svy: prefix, use casewise deletion and estimate full covariance matrices for the estimates.

New command svy: tabulate oneway addresses a missing feature. Previously, anyone wanting a one-way tabulation had to create a constant and perform two-way survey tabulation with that constant.

New command estat computes and reports additional statistics and information after estimation with svy: prefix:

estat svyset reports complete information on the survey design.

estat effects computes and reports the design effects—DEFF and DEFT—and the misspecification effects—MEFF and MEFT—in any combination for each estimated parameter.

estat effects can also compute DEFF and DEFT for subpopulations using simple random-sample estimates either from the overall population or from the subpopulation. estat effects replaces and extends the deff, deft, meff, and meft options previously available on survey estimators.

estat lceffects computes and reports the survey design effects and misspecification effects for any linear combination of estimated parameters.

estat size reports the sample and population sizes for each subpopulation after svy: mean, svy: proportion, svy: ratio, and svy: total.

For details on estat after survey estimation, see [SVY] estat.

Existing command svydes has several new features and options:

New option stage() lets you select the sampling stage for sample statistics to be reported.

New option generate() identifies strata with a single sampling unit.

New option finalstage replaces bypsu and reports observation sample statistics by sampling unit in the final stage.

New options stdize() and stdweight() for commands svy: mean, svy: ratio, svy: proportion, :svy: tabulate oneway, and svy: tabulate twoway allow direct standardization of means, ratios, proportions, and tabulations using any of the three survey variance estimators.

Programmers of estimation commands can get full support for estimation with survey and correlated data almost automatically. This support includes correct treatment of multistage designs, weighting, stratification, poststratification, and finite-population corrections, as well as access to all three variance estimators. For a discussion, see [P] program properties.

The [SVY] manual now has a glossary that defines commonly used terms in survey analysis and explains how these terms are used in the manual; see [SVY] glossary.

This page contains only historical information and is not about the current release of Stata. Please see our Stata 19 page for information on the current version of Stata.

Survey statistics

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

svy: mean	Population and subpopulation means
svy: proportion	Population and subpopulation proportions
svy: ratio	Population and subpopulation ratios
svy: total	Population and subpopulation totals
svy: tabulate oneway	One-way tables for survey data
svy: tabulate twoway	Two-way tables for survey data

svy: regress	Linear regression
svy: ivreg	Instrumental variables regression
svy: intreg	Interval and censored regression
svy: logistic	Logistic regression, reporting odds ratios
svy: logit	Logistic regression, reporting coefficients
svy: probit	Probit regression
svy: mlogit	Multinomial logistic regression
svy: ologit	Ordered logistic regression
svy: oprobit	Ordered probit models
svy: poisson	Poisson regression
svy: nbreg	Negative binomial regression
svy: gnbreg	Generalized negative binomial regression
svy: heckman	Heckman selection model
svy: heckprob	Probit estimation with selection

Stata/MP4 Annual License (download)

This page contains only historical information and is not about the current release of Stata. Please see our Stata 19 page for information on the current version of Stata.

Survey statistics

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies