**[SVY] svyset** -- Declare survey design for dataset

__Syntax__

Single-stage design

**svyset** [*psu*] [*weight*] [**,** *design_options* *options*]

Multiple-stage design

**svyset** *psu* [*weight*] [**,** *design_options*] [**||** *ssu* **,** *design_options*] ...
[*options*]

Clear the current settings

**svyset,** **clear**

Report the current settings

**svyset**

*psu* identifies the primary sampling units and may be **_n** or *varname*. In
the single-stage syntax, *psu* is optional and defaults to **_n**.

**_n** indicates that individuals were randomly sampled if the design
does not involve clustered sampling.

*varname* contains identifiers for the clusters in a clustered sampling
design.

*ssu* is **_n** or *varname* containing identifiers for sampling units (clusters)
in subsequent stages of the survey design.

**_n** indicates that individuals were randomly sampled within the last
sampling stage.

*design_options* Description
-------------------------------------------------------------------------
Main
__str__**ata(***varname***)** variable identifying strata
**fpc(***varname***)** finite population correction
**weight(***varname***)** stage-level sampling weight
-------------------------------------------------------------------------

*options* Description
-------------------------------------------------------------------------
Weights
__brr__**weight(***varlist***)** balanced repeated replicate (BRR) weights
**fay(***#***)** Fay's adjustment
__bsr__**weight(***varlist***)** bootstrap replicate weights
**bsn(***#***)** bootstrap mean-weight adjustment
__jkr__**weight(***varlist***,** *jkropts***)** jackknife replicate weights
__sdr__**weight(***varlist***,** *sdropts***)** successive difference replicate (SDR)
weights

SE
**vce(**__linear__**ized)** Taylor linearized variance estimation
**vce(bootstrap)** bootstrap variance estimation
**vce(brr)** BRR variance estimation
**vce(**__jack__**knife)** jackknife variance estimation
**vce(sdr)** SDR variance estimation
**dof(***#***)** design degrees of freedom
**mse** use the MSE formula with **vce(bootstrap)**,
**vce(brr)**, **vce(jackknife)**, or **vce(sdr)**
__single__**unit(***method***)** strata with a single sampling unit; *method*
may be __mis__**sing**, __cer__**tainty**, __sca__**led**, or
__cen__**tered**

Poststratification
__posts__**trata(***varname***)** variable identifying poststrata
__postw__**eight(***varname***)** poststratum population sizes

Calibration
**rake(***varlist***,** *calopts***)** adjust weights using the raking-ratio
method
__reg__**ress(***varlist***,** *calopts***)** adjust weights using linear regression
calibration

**clear** clear all settings from the data
**noclear** change some of the settings without
clearing the others
**clear(***opnames***)** clear the specified settings without
clearing all others; *opnames* may be one
or more of __w__**eight**, **vce**, **dof**, **mse**,
__bsr__**weight**, __brr__**weight**, __jkr__**weight**,
__sdr__**weight**, __post__**strata**, **rake**, or **regress**
-------------------------------------------------------------------------
**pweight**s and **iweight**s are allowed; see weights.
**clear**, **noclear**, and **clear()** are not shown in the dialog box.

*jkropts* Description
-------------------------------------------------------------------------
__str__**atum(***#* [*#* ...]**)** stratum identifier for each jackknife
replicate weight
**fpc(***#* [*#* ...]**)** finite population correction for each
jackknife replicate weight
__mult__**iplier(***#* [*#* ...]**)** variance multiplier for each jackknife
replicate weight
**reset** reset characteristics for each jackknife
replicate weight
-------------------------------------------------------------------------

*sdropts* Description
-------------------------------------------------------------------------
**fpc(***#* [*#* ...]**)** finite population correction for the SDR
weights
-------------------------------------------------------------------------

*calopts* Description
-------------------------------------------------------------------------
* __tot__**als(***spec***)** population totals
__nocons__**tant** suppress constant term
**ll(***#***)** lower limit for weight ratios
**ul(***#***)** upper limit for weight ratios
__iter__**ate(***#***)** maximum number of iterations
__tol__**erance(***#***)** convergence tolerance
**force** allow calibration adjustments that failed
to converge
-------------------------------------------------------------------------
* **totals()** is required.

__Menu__

**Statistics > Survey data analysis > Setup and utilities >** **Declare survey**
**design for dataset**

__Description__

**svyset** manages the survey analysis settings of a dataset. You use **svyset**
to designate variables that contain information about the survey design,
such as the sampling units and weights. **svyset** is also used to specify
other design characteristics, such as the number of sampling stages and
the sampling method, and analysis defaults, such as the method for
variance estimation. You must **svyset** your data before using any **svy**
command; see **[SVY] svy estimation**.

**svyset** without arguments reports the current settings. **svyset, clear**
removes the current survey settings.

__Options__

+------+
----+ Main +-------------------------------------------------------------

**strata(***varname***)** specifies the name of a variable (numeric or string) that
contains stratum identifiers.

**fpc(***varname***)** requests a finite population correction for the variance
estimates. If *varname* has values less than or equal to 1, it is
interpreted as a stratum sampling rate *f*_*h* = *n*_*h*/*N*_*h*, where *n*_*h* =
number of units sampled from stratum *h* and *N*_*h* = total number of
units in the population belonging to stratum *h*. If *varname* has
values greater than or equal to *n*_*h*, it is interpreted as containing
*N*_*h*. It is an error for *varname* to have values between 1 and *n*_*h* or
to have a mixture of sampling rates and stratum sizes.

**weight(***varname***)** specifies a stage-level sampling weight variable. For
most models, stage-level sampling weights are multiplied together to
create a single observation-level sampling weight variable used for
weighted estimation. For commands such as **gsem** and **meglm**, each
stage-level weight variable is assumed to correspond with a
hierarchical group level in the model and is used to compute the
pseudolikelihood at that associated group level. Stage-level
sampling weights are required to be constant within their
corresponding group level. For examples of fitting a multilevel
model with stage-level sampling weights, see example 5 and example 6
in **[ME] meglm**.

+---------+
----+ Weights +----------------------------------------------------------

**brrweight(***varlist***)** specifies the replicate-weight variables to be used
with **vce(brr)** or with **svy** **brr**.

**fay(***#***)** specifies Fay's adjustment (Judkins 1990). The value specified in
**fay(***#***)** is used to adjust the BRR weights and is present in the BRR
variance formulas.

The sampling weight of the selected PSUs for a given replicate is
multiplied by **2-***#*, where the sampling weight for the unselected PSUs
is multiplied by *#*. When **brrweight(***varlist***)** is specified, the
replicate-weight variables in *varlist* are assumed to be adjusted
using *#*.

**fay(0)** is the default and is equivalent to the original BRR method.
*#* must be between 0 and 2, inclusive, and excluding 1. **fay(1)** is not
allowed because this results in unadjusted weights.

**bsrweight(***varlist***)** specifies the replicate-weight variables to be used
with **vce(bootstrap)** or with **svy** **bootstrap**.

**bsn(***#***)** specifies that *#* bootstrap replicate-weight variables were used to
generate each bootstrap mean-weight variable specified in the
**bsrweight()** option. The default is **bsn(1)**. The value specified in
**bsn(***#***)** is used to adjust the variance estimate to account for mean
bootstrap weights.

**jkrweight(***varlist***,** *jkropts***)** specifies the replicate-weight variables to
be used with **vce(jackknife)** or with **svy** **jackknife**.

The following *jkropts* set characteristics on the jackknife
replicate-weight variables. If one value is specified, all the
specified jackknife replicate-weight variables will be supplied with
the same characteristic. If multiple values are specified, each
replicate-weight variable will be supplied with the corresponding
value according to the order specified. *jkropts* are not shown in the
dialog box.

**stratum(***# *[*# *...]**)** specifies an identifier for the stratum in which
the sampling weights have been adjusted.

**fpc(***# *[*# *...]**)** specifies the FPC value to be added as a
characteristic of the jackknife replicate-weight variables. The
values set by this suboption have the same interpretation as the
**fpc(***varname***)** option.

**multiplier(***# *[*# *...]**)** specifies the value of a jackknife multiplier
to be added as a characteristic of the jackknife replicate-weight
variables.

**reset** indicates that the characteristics for the replicate-weight
variables may be overwritten or reset to the default, if they
exist.

**sdrweight(***varlist***,** *sdropts***)** specifies the replicate-weight variables to
be used with **vce(sdr)** or with **svy** **sdr**. The following *srdopts* is
available:

**fpc(***#***)** specifies the FPC value associated with the SDR weights. The
value set by this suboption has the same interpretation as the
**fpc(***varname***)** option. This option is not shown in the dialog box.

+----+
----+ SE +---------------------------------------------------------------

**vce(***vcetype***)** specifies the default method for variance estimation; see
**[SVY] variance estimation**.

**vce(linearized)** sets the default to Taylor linearization.

**vce(bootstrap)** sets the default to the bootstrap; also see **[SVY] svy**
**bootstrap**.

**vce(brr)** sets the default to BRR; also see **[SVY] svy brr**.

**vce(jackknife)** sets the default to the jackknife; see **[SVY] svy**
**jackknife**.

**vce(sdr)** sets the default to SDR; also see **[SVY] svy sdr**.

**dof(***#***)** specifies the design degrees of freedom, overriding the default
calculation, df = N_psu - N_strata.

**mse** specifies that the MSE formula be used when **vce(bootstrap)**, **vce(brr)**,
**vce(jackknife)**, or **vce(sdr)** is specified. This option requires
**vce(bootstrap)**, **vce(brr)**, **vce(jackknife)**, or **vce(sdr)**.

**singleunit(***method***)** specifies how to handle strata with one sampling unit.

**singleunit(missing)** results in missing values for the standard errors
and is the default.

**singleunit(certainty)** causes strata with single sampling units to be
treated as certainty units. Certainty units contribute nothing
to the standard error.

**singleunit(scaled)** results in a scaled version of
**singleunit(certainty)**. The scaling factor comes from using the
average of the variances from the strata with multiple sampling
units for each stratum with one sampling unit.

**singleunit(centered)** specifies that strata with one sampling unit are
centered at the grand mean instead of the stratum mean.

+--------------------+
----+ Poststratification +-----------------------------------------------

**poststrata(***varname***)** specifies the name of the variable (numeric or
string) that contains poststratum identifiers. See **[SVY]**
**poststratification** for more information.

**postweight(***varname***)** specifies the name of the numeric variable that
contains poststratum population totals (or sizes), that is, the
number of elementary sampling units in the population within each
poststratum. See **[SVY] poststratification** for more information.

+-------------+
----+ Calibration +------------------------------------------------------

**rake(***varlist***,** *calopts***)** and **regress(***varlist***,** *calopts***)** specify that the
sampling weights be adjusted using a calibration adjustment. See
**[SVY] calibration** for more information.

**rake()** specifies that the weights be adjusted by the raking-ratio
method.

**regress()** specifies that the weights be adjusted by linear
regression.

The following *calopts* are available:

**totals(***spec***)** specifies the population totals corresponding to the
variables specified in *varlist*. *spec* is one of

*matname* [**,** **skip** **copy**]

{ [*eqname***:**]*name* **=** *#* | **/***eqname* **=** *#* } [*...*]

*#* [*#* *...*]**,** **copy**

That is, *spec* may be a matrix name, for example,
**totals(poptotals)**; a list of variable names in *varlist* with
their population totals, for example, **totals(_cons=1300**
**dogs=850 cats=450)**; or a list of values, for example,
**totals(850 450 1300)**.

**skip** specifies that any parameters found in the specified
totals vector that are not also found in the model be
ignored. The default action is to issue an error
message.

**copy** specifies that the list of values or the totals vector
be copied into the population-totals vector by position
rather than by name.

**noconstant** suppresses the intercept in the linear regression
adjustment.

**ll(***#***)** specifies a lower limit for the weight ratios for truncated
linear calibration.

**ul(***#***)** specifies an upper limit for the weight ratios for
truncated linear calibration.

**iterate(***#***)** specifies the maximum number of iterations. When the
number of iterations equals **iterate()**, the calibration
adjustment stops and presents a note. The default is
**iterate(1000)**.

**tolerance(***#***)** specifies the tolerance for the Lagrange multiplier
in the calibration equations. Convergence is achieved when
the relative change in the Lagrange multiplier from one
iteration to the next is less than or equal to **tolerance()**.
The default is **tolerance(1e-7)**.

**force** prevents **svy** from exiting with an error if the calibration
adjustment fails to converge.

The following options are available with **svyset** but are not shown in the
dialog box:

**clear** clears all the settings from the data. Typing

**. svyset, clear**

clears the survey design characteristics from the data in memory.
Although this option may be specified with some of the other **svyset**
options, it is redundant because **svyset** automatically clears the
previous settings before setting new survey design characteristics.

**noclear** allows some of the options in *options* to be changed without
clearing all the other settings. This option is not allowed with
*psu*, *ssu*, *design_options*, or **clear**.

**clear(***opnames***)** allows some of the options in *options* to be cleared
without clearing all the other settings. *opnames* refers to an option
name and may be one or more of the following: **weight**, **vce**, **dof**, **mse**,
**brrweight**, **bsrweight**, **jkrweight**, **sdrweight**, **poststrata**, **rake**, or
**regress**.

This option implies the **noclear** option.

__Examples__

Setup
**. webuse stage5a**

Simple random sampling with replacement
**. svyset _n**

One-stage clustered design with stratification
**. svyset su1 [pweight=pw], strata(strata)**

Two-stage designs
**. svyset su1 [pweight=pw], fpc(fpc1) || _n, fpc(fpc2)**
**. svyset su1 [pweight=pw], fpc(fpc1) || su2, fpc(fpc2)**
**. svyset su1 [pweight=pw], fpc(fpc1) || su2, fpc(fpc2) strata(strata)**

Multiple-stage designs
**. svyset su1 [pweight=pw], fpc(fpc1) strata(strata) || su2, fpc(fpc2)**
**|| su3, fpc(fpc3)**
**. svyset su1 [pweight=pw], fpc(fpc1) strata(strata) || su2, fpc(fpc2)**
**|| su3, fpc(fpc3) || _n**

Finite population correction (FPC)
**. webuse fpc**
**. list**
**. svyset psuid [pweight=weight], strata(stratid) fpc(Nh)**
**. svy: mean x**
**. svyset psuid [pweight=weight], strata(stratid)**
**. svy: mean x**

Multiple-stage designs and with-replacement sampling
**. webuse stage5a**
**. svyset su1 || _n, fpc(fpc2)**

Replication weight variables
**. webuse stage5a_jkw**
**. svyset [pweight=pw], jkrweight(jkw_*) vce(jackknife)**
**. svyset [pweight=pw], jkrweight(jkw_*) vce(jackknife) mse**

__Video example__

Specifying the design of your survey data to Stata

__Stored results__

**svyset** stores the following in **r()**:

Scalars
**r(stages)** number of sampling stages
**r(stages_wt)** last stage containing stage-level weights

Macros
**r(wtype)** weight type
**r(wexp)** weight expression
**r(wvar)** weight variable name
**r(weight***#***)** variable identifying weight for stage *#*
**r(su***#***)** variable identifying sampling units for stage *#*
**r(strata***#***)** variable identifying strata for stage *#*
**r(fpc***#***)** FPC for stage *#*
**r(bsrweight)** **bsrweight()** variable list
**r(bsn)** bootstrap mean-weight adjustment
**r(brrweight)** **brrweight()** variable list
**r(fay)** Fay's adjustment
**r(jkrweight)** **jkrweight()** variable list
**r(sdrweight)** **sdrweight()** variable list
**r(sdrfpc)** **fpc()** value from within **sdrweight()**
**r(vce)** *vcetype* specified in **vce()**
**r(dof)** **dof()** value
**r(mse)** **mse**, if specified
**r(poststrata)** **poststrata()** variable
**r(postweight)** **postweight()** variable
**r(rake)** **rake()** specification
**r(regress)** **regress()** specification
**r(settings)** **svyset** arguments to reproduce the current settings
**r(singleunit)** **singleunit()** setting

__Reference__

Judkins, D. R. 1990. Fay's method for variance estimation. *Journal of*
*Official Statistics* 6: 223-239.