Stata 15 help for svyset

[SVY] svyset -- Declare survey design for dataset

Syntax

Single-stage design

svyset [psu] [weight] [, design_options options]

Multiple-stage design

svyset psu [weight] [, design_options] [|| ssu , design_options] ... [options]

Clear the current settings

svyset, clear

Report the current settings

svyset

psu identifies the primary sampling units and may be _n or varname. In the single-stage syntax, psu is optional and defaults to _n.

_n indicates that individuals were randomly sampled if the design does not involve clustered sampling.

varname contains identifiers for the clusters in a clustered sampling design.

ssu is _n or varname containing identifiers for sampling units (clusters) in subsequent stages of the survey design.

_n indicates that individuals were randomly sampled within the last sampling stage.

design_options Description ------------------------------------------------------------------------- Main strata(varname) variable identifying strata fpc(varname) finite population correction weight(varname) stage-level sampling weight -------------------------------------------------------------------------

options Description ------------------------------------------------------------------------- Weights brrweight(varlist) balanced repeated replicate (BRR) weights fay(#) Fay's adjustment bsrweight(varlist) bootstrap replicate weights bsn(#) bootstrap mean-weight adjustment jkrweight(varlist, jkropts) jackknife replicate weights sdrweight(varlist, sdropts) successive difference replicate (SDR) weights

SE vce(linearized) Taylor linearized variance estimation vce(bootstrap) bootstrap variance estimation vce(brr) BRR variance estimation vce(jackknife) jackknife variance estimation vce(sdr) SDR variance estimation dof(#) design degrees of freedom mse use the MSE formula with vce(bootstrap), vce(brr), vce(jackknife), or vce(sdr) singleunit(method) strata with a single sampling unit; method may be missing, certainty, scaled, or centered

Poststratification poststrata(varname) variable identifying poststrata postweight(varname) poststratum population sizes

Calibration rake(varlist, calopts) adjust weights using the raking-ratio method regress(varlist, calopts) adjust weights using linear regression calibration

clear clear all settings from the data noclear change some of the settings without clearing the others clear(opnames) clear the specified settings without clearing all others; opnames may be one or more of weight, vce, dof, mse, bsrweight, brrweight, jkrweight, sdrweight, poststrata, rake, or regress ------------------------------------------------------------------------- pweights and iweights are allowed; see weights. clear, noclear, and clear() are not shown in the dialog box.

jkropts Description ------------------------------------------------------------------------- stratum(# [# ...]) stratum identifier for each jackknife replicate weight fpc(# [# ...]) finite population correction for each jackknife replicate weight multiplier(# [# ...]) variance multiplier for each jackknife replicate weight reset reset characteristics for each jackknife replicate weight -------------------------------------------------------------------------

sdropts Description ------------------------------------------------------------------------- fpc(# [# ...]) finite population correction for the SDR weights -------------------------------------------------------------------------

calopts Description ------------------------------------------------------------------------- * totals(spec) population totals noconstant suppress constant term ll(#) lower limit for weight ratios ul(#) upper limit for weight ratios iterate(#) maximum number of iterations tolerance(#) convergence tolerance force allow calibration adjustments that failed to converge ------------------------------------------------------------------------- * totals() is required.

Menu

Statistics > Survey data analysis > Setup and utilities > Declare survey design for dataset

Description

svyset manages the survey analysis settings of a dataset. You use svyset to designate variables that contain information about the survey design, such as the sampling units and weights. svyset is also used to specify other design characteristics, such as the number of sampling stages and the sampling method, and analysis defaults, such as the method for variance estimation. You must svyset your data before using any svy command; see [SVY] svy estimation.

svyset without arguments reports the current settings. svyset, clear removes the current survey settings.

Options

+------+ ----+ Main +-------------------------------------------------------------

strata(varname) specifies the name of a variable (numeric or string) that contains stratum identifiers.

fpc(varname) requests a finite population correction for the variance estimates. If varname has values less than or equal to 1, it is interpreted as a stratum sampling rate f_h = n_h/N_h, where n_h = number of units sampled from stratum h and N_h = total number of units in the population belonging to stratum h. If varname has values greater than or equal to n_h, it is interpreted as containing N_h. It is an error for varname to have values between 1 and n_h or to have a mixture of sampling rates and stratum sizes.

weight(varname) specifies a stage-level sampling weight variable. For most models, stage-level sampling weights are multiplied together to create a single observation-level sampling weight variable used for weighted estimation. For commands such as gsem and meglm, each stage-level weight variable is assumed to correspond with a hierarchical group level in the model and is used to compute the pseudolikelihood at that associated group level. Stage-level sampling weights are required to be constant within their corresponding group level. For examples of fitting a multilevel model with stage-level sampling weights, see example 5 and example 6 in [ME] meglm.

+---------+ ----+ Weights +----------------------------------------------------------

brrweight(varlist) specifies the replicate-weight variables to be used with vce(brr) or with svy brr.

fay(#) specifies Fay's adjustment (Judkins 1990). The value specified in fay(#) is used to adjust the BRR weights and is present in the BRR variance formulas.

The sampling weight of the selected PSUs for a given replicate is multiplied by 2-#, where the sampling weight for the unselected PSUs is multiplied by #. When brrweight(varlist) is specified, the replicate-weight variables in varlist are assumed to be adjusted using #.

fay(0) is the default and is equivalent to the original BRR method. # must be between 0 and 2, inclusive, and excluding 1. fay(1) is not allowed because this results in unadjusted weights.

bsrweight(varlist) specifies the replicate-weight variables to be used with vce(bootstrap) or with svy bootstrap.

bsn(#) specifies that # bootstrap replicate-weight variables were used to generate each bootstrap mean-weight variable specified in the bsrweight() option. The default is bsn(1). The value specified in bsn(#) is used to adjust the variance estimate to account for mean bootstrap weights.

jkrweight(varlist, jkropts) specifies the replicate-weight variables to be used with vce(jackknife) or with svy jackknife.

The following jkropts set characteristics on the jackknife replicate-weight variables. If one value is specified, all the specified jackknife replicate-weight variables will be supplied with the same characteristic. If multiple values are specified, each replicate-weight variable will be supplied with the corresponding value according to the order specified. jkropts are not shown in the dialog box.

stratum(# [# ...]) specifies an identifier for the stratum in which the sampling weights have been adjusted.

fpc(# [# ...]) specifies the FPC value to be added as a characteristic of the jackknife replicate-weight variables. The values set by this suboption have the same interpretation as the fpc(varname) option.

multiplier(# [# ...]) specifies the value of a jackknife multiplier to be added as a characteristic of the jackknife replicate-weight variables.

reset indicates that the characteristics for the replicate-weight variables may be overwritten or reset to the default, if they exist.

sdrweight(varlist, sdropts) specifies the replicate-weight variables to be used with vce(sdr) or with svy sdr. The following srdopts is available:

fpc(#) specifies the FPC value associated with the SDR weights. The value set by this suboption has the same interpretation as the fpc(varname) option. This option is not shown in the dialog box.

+----+ ----+ SE +---------------------------------------------------------------

vce(vcetype) specifies the default method for variance estimation; see [SVY] variance estimation.

vce(linearized) sets the default to Taylor linearization.

vce(bootstrap) sets the default to the bootstrap; also see [SVY] svy bootstrap.

vce(brr) sets the default to BRR; also see [SVY] svy brr.

vce(jackknife) sets the default to the jackknife; see [SVY] svy jackknife.

vce(sdr) sets the default to SDR; also see [SVY] svy sdr.

dof(#) specifies the design degrees of freedom, overriding the default calculation, df = N_psu - N_strata.

mse specifies that the MSE formula be used when vce(bootstrap), vce(brr), vce(jackknife), or vce(sdr) is specified. This option requires vce(bootstrap), vce(brr), vce(jackknife), or vce(sdr).

singleunit(method) specifies how to handle strata with one sampling unit.

singleunit(missing) results in missing values for the standard errors and is the default.

singleunit(certainty) causes strata with single sampling units to be treated as certainty units. Certainty units contribute nothing to the standard error.

singleunit(scaled) results in a scaled version of singleunit(certainty). The scaling factor comes from using the average of the variances from the strata with multiple sampling units for each stratum with one sampling unit.

singleunit(centered) specifies that strata with one sampling unit are centered at the grand mean instead of the stratum mean.

+--------------------+ ----+ Poststratification +-----------------------------------------------

poststrata(varname) specifies the name of the variable (numeric or string) that contains poststratum identifiers. See [SVY] poststratification for more information.

postweight(varname) specifies the name of the numeric variable that contains poststratum population totals (or sizes), that is, the number of elementary sampling units in the population within each poststratum. See [SVY] poststratification for more information.

+-------------+ ----+ Calibration +------------------------------------------------------

rake(varlist, calopts) and regress(varlist, calopts) specify that the sampling weights be adjusted using a calibration adjustment. See [SVY] calibration for more information.

rake() specifies that the weights be adjusted by the raking-ratio method.

regress() specifies that the weights be adjusted by linear regression.

The following calopts are available:

totals(spec) specifies the population totals corresponding to the variables specified in varlist. spec is one of

matname [, skip copy]

{ [eqname:]name = # | /eqname = # } [...]

# [# ...], copy

That is, spec may be a matrix name, for example, totals(poptotals); a list of variable names in varlist with their population totals, for example, totals(_cons=1300 dogs=850 cats=450); or a list of values, for example, totals(850 450 1300).

skip specifies that any parameters found in the specified totals vector that are not also found in the model be ignored. The default action is to issue an error message.

copy specifies that the list of values or the totals vector be copied into the population-totals vector by position rather than by name.

noconstant suppresses the intercept in the linear regression adjustment.

ll(#) specifies a lower limit for the weight ratios for truncated linear calibration.

ul(#) specifies an upper limit for the weight ratios for truncated linear calibration.

iterate(#) specifies the maximum number of iterations. When the number of iterations equals iterate(), the calibration adjustment stops and presents a note. The default is iterate(1000).

tolerance(#) specifies the tolerance for the Lagrange multiplier in the calibration equations. Convergence is achieved when the relative change in the Lagrange multiplier from one iteration to the next is less than or equal to tolerance(). The default is tolerance(1e-7).

force prevents svy from exiting with an error if the calibration adjustment fails to converge.

The following options are available with svyset but are not shown in the dialog box:

clear clears all the settings from the data. Typing

. svyset, clear

clears the survey design characteristics from the data in memory. Although this option may be specified with some of the other svyset options, it is redundant because svyset automatically clears the previous settings before setting new survey design characteristics.

noclear allows some of the options in options to be changed without clearing all the other settings. This option is not allowed with psu, ssu, design_options, or clear.

clear(opnames) allows some of the options in options to be cleared without clearing all the other settings. opnames refers to an option name and may be one or more of the following: weight, vce, dof, mse, brrweight, bsrweight, jkrweight, sdrweight, poststrata, rake, or regress.

This option implies the noclear option.

Examples

Setup . webuse stage5a

Simple random sampling with replacement . svyset _n

One-stage clustered design with stratification . svyset su1 [pweight=pw], strata(strata)

Two-stage designs . svyset su1 [pweight=pw], fpc(fpc1) || _n, fpc(fpc2) . svyset su1 [pweight=pw], fpc(fpc1) || su2, fpc(fpc2) . svyset su1 [pweight=pw], fpc(fpc1) || su2, fpc(fpc2) strata(strata)

Multiple-stage designs . svyset su1 [pweight=pw], fpc(fpc1) strata(strata) || su2, fpc(fpc2) || su3, fpc(fpc3) . svyset su1 [pweight=pw], fpc(fpc1) strata(strata) || su2, fpc(fpc2) || su3, fpc(fpc3) || _n

Finite population correction (FPC) . webuse fpc . list . svyset psuid [pweight=weight], strata(stratid) fpc(Nh) . svy: mean x . svyset psuid [pweight=weight], strata(stratid) . svy: mean x

Multiple-stage designs and with-replacement sampling . webuse stage5a . svyset su1 || _n, fpc(fpc2)

Replication weight variables . webuse stage5a_jkw . svyset [pweight=pw], jkrweight(jkw_*) vce(jackknife) . svyset [pweight=pw], jkrweight(jkw_*) vce(jackknife) mse

Video example

Specifying the design of your survey data to Stata

Stored results

svyset stores the following in r():

Scalars r(stages) number of sampling stages r(stages_wt) last stage containing stage-level weights

Macros r(wtype) weight type r(wexp) weight expression r(wvar) weight variable name r(weight#) variable identifying weight for stage # r(su#) variable identifying sampling units for stage # r(strata#) variable identifying strata for stage # r(fpc#) FPC for stage # r(bsrweight) bsrweight() variable list r(bsn) bootstrap mean-weight adjustment r(brrweight) brrweight() variable list r(fay) Fay's adjustment r(jkrweight) jkrweight() variable list r(sdrweight) sdrweight() variable list r(sdrfpc) fpc() value from within sdrweight() r(vce) vcetype specified in vce() r(dof) dof() value r(mse) mse, if specified r(poststrata) poststrata() variable r(postweight) postweight() variable r(rake) rake() specification r(regress) regress() specification r(settings) svyset arguments to reproduce the current settings r(singleunit) singleunit() setting

Reference

Judkins, D. R. 1990. Fay's method for variance estimation. Journal of Official Statistics 6: 223-239.


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index