Home  /  Products  /  Stata 9  /  Survey statistics

This page contains only historical information and is not about the current release of Stata. Please see our Stata 18 page for information on the current version of Stata.


Survey statistics

With the addition of balanced and repeated replications (BRR) and the addition of survey jackknife, Stata is now the only full-featured statistical package to directly support all three major variance estimators for survey and correlated data: BRR, jackknife, and cluster-based linearization.

Complete support is now included for multistage designs and for poststratification. To handle multistage designs, you specify the design when you svyset your data. To set a two-stage cluster design on city and schools within cities, stratifying on state, you set

     . svyset city [pw=wgt], strata(state) fpc(ncities) || school, fpc(nschools)
If you also wanted to poststratify on gender, you would add poststrata(gender) postweight(ngender). After this svyset, all Stata’s survey estimators will properly account for your design.

Here are all the details.

A new, unified syntax is used for declaring the design of survey data and for fitting models. For an overview of all survey facilities, see [SVY] survey.

All the old syntax continues to work under version control, although the survey estimation commands do not even require that, but if you use old syntax, the new features will not be available.
  • Existing command svyset for declaring the survey design has new syntax that supports a host of new features in Stata’s survey-analysis facilities:

    • BRR and jackknife variance estimators have been added to the previously available linearization variance estimator. Moreover, use of BRR or jackknife (or linearization) can now be specified when you svyset or at estimation time.

    • Multistage designs can now be declared, and they may have primary, secondary, and lower-stage sampling units. The linearization variance estimator takes complete advantage of the information in multistage designs.

    • Stratification is now allowed in all stages, making variance estimates more efficient wherever stratification can be exploited.

    • Poststratification is now available and, like stratification, also makes variance estimates more efficient. Poststratification adjusts weights, improves variance estimates, and accounts for biases when demographic or other groupings are known.

    • Finite-population corrections are now allowed in all stages.

    • Sampling weights are handled under all three variance estimators.

    For details, see [SVY] svyset. The previous svyset syntax continues to work under version control.

  • New prefix command svy: is how you tell estimators that you have survey data. You no longer type svyregress; you type svy: regress. This is not just a matter of style; svy really is a prefix command, and in fact, you can even use it as a prefix on estimation commands you write. In addition, svy: provides a standard, unified syntax for accessing Stata’s survey features, and svy: is easy to use because it automatically applies everything you have previously svyset, including the design.

    The following estimators can be used with svy: prefix:

    Descriptive statistics
    svy: mean Population and subpopulation means
    svy: proportion Population and subpopulation proportions
    svy: ratio Population and subpopulation ratios
    svy: total Population and subpopulation totals
    svy: tabulate oneway One-way tables for survey data
    svy: tabulate twoway Two-way tables for survey data
    Regression models
    svy: regress Linear regression
    svy: ivreg Instrumental variables regression
    svy: intreg Interval and censored regression
    svy: logistic Logistic regression, reporting odds ratios
    svy: logit Logistic regression, reporting coefficients
    svy: probit Probit regression
    svy: mlogit Multinomial logistic regression
    svy: ologit Ordered logistic regression
    svy: oprobit Ordered probit models
    svy: poisson Poisson regression
    svy: nbreg Negative binomial regression
    svy: gnbreg Generalized negative binomial regression
    svy: heckman Heckman selection model
    svy: heckprob Probit estimation with selection
    Previously existing survey-estimation commands, such as svyregress, svymean, and svypoisson, continue to work as they did before, but only if your survey design is declared using version 8: svyset or if you are working with an old Stata 8 dataset. For a mapping from old estimation commands to the new syntax, see svy8. (The new prefix svy: works with datasets that were svyset under an earlier release of Stata.)

    In addition to the three variance estimators and support for multistage sampling, the new svy: prefix provides other enhancements, including

    • Option subpop() allows more flexible selection of subpopulations, meaning that more general if conditions are now allowed.

    • Strata with only one sampling unit (sometimes called singleton PSUs) are now handled better—the coefficients are now reported, but with missing standard errors. svydes can now be used to find and describe these strata; see [SVY] svydes.

    • With BRR variance estimation, a Hadamard matrix can be used in place of BRR weights, and Fay’s adjustment may be specified; see [SVY] brr_options.

  • New command svy: proportion replaces svyprop. (By the way, new command proportion can be used without the svy: prefix; see [R] proportion.) Unlike svyprop, svy: proportion is an estimation command and computes a full covariance matrix for all the estimated proportions, allowing postestimation features, such as tests of linear and nonlinear combinations of proportions (test and testnl) or creation of linear and nonlinear combinations with confidence intervals (lincom and nlcom).

  • New commands ratio, total, and mean, used with the svy: prefix, use casewise deletion and estimate full covariance matrices for the estimates.

  • New command svy: tabulate oneway addresses a missing feature. Previously, anyone wanting a one-way tabulation had to create a constant and perform two-way survey tabulation with that constant.

  • New command estat computes and reports additional statistics and information after estimation with svy: prefix:

    • estat svyset reports complete information on the survey design.

    • estat effects computes and reports the design effects—DEFF and DEFT—and the misspecification effects—MEFF and MEFT—in any combination for each estimated parameter.

    • estat effects can also compute DEFF and DEFT for subpopulations using simple random-sample estimates either from the overall population or from the subpopulation. estat effects replaces and extends the deff, deft, meff, and meft options previously available on survey estimators.

    • estat lceffects computes and reports the survey design effects and misspecification effects for any linear combination of estimated parameters.

    • estat size reports the sample and population sizes for each subpopulation after svy: mean, svy: proportion, svy: ratio, and svy: total.

    For details on estat after survey estimation, see [SVY] estat.

  • Existing command svydes has several new features and options:

    • New option stage() lets you select the sampling stage for sample statistics to be reported.

    • New option generate() identifies strata with a single sampling unit.

    • New option finalstage replaces bypsu and reports observation sample statistics by sampling unit in the final stage.

  • New options stdize() and stdweight() for commands svy: mean, svy: ratio, svy: proportion, :svy: tabulate oneway, and svy: tabulate twoway allow direct standardization of means, ratios, proportions, and tabulations using any of the three survey variance estimators.

  • Programmers of estimation commands can get full support for estimation with survey and correlated data almost automatically. This support includes correct treatment of multistage designs, weighting, stratification, poststratification, and finite-population corrections, as well as access to all three variance estimators. For a discussion, see [P] program properties.

  • The [SVY] manual now has a glossary that defines commonly used terms in survey analysis and explains how these terms are used in the manual; see [SVY] glossary.