# Re: st: Stata svyset definition of strata?

 From jpitblado@stata.com (Jeff Pitblado, StataCorp LP) To statalist@hsphsun2.harvard.edu Subject Re: st: Stata svyset definition of strata? Date Thu, 12 May 2005 15:32:25 -0500

Don Lloyd <dlloyd@fsu.edu> asks how to use -svyset-:

> I'd like to be sure I understand what Stata means by "strata" for the svyset
> functions. Our group selected a large community based random sample of
> disabled residents and a nondisabled control group in a Florida county. We
> randomly sampled 100 PSUs within the county; households within selected PSUs
> were then randomly selected and screened for disabled adults. We chose cases
> from the screened adult list such that equal proportions of men and women,
> and equal proportions of each ethnic/racial group, were included in the
> disabled target sample. For each disabled person, we then selected a similar
> but non-disabled person from within the same PSU, gender, and ethnic group,
> who was within 5 years of age. The achieved samples have similar, thought
> not identical, distributions on gender, ethnicity, and age. We've developed
> weights to adjust the distributions of nondisabled to match the disabled, in
> these three dimensions. Two disability statuses by two sexes by three age
> groups by five ethnic groups yield 60 strata -- is this the correct term
> with respect to Stata's svyset function? While everyone in the disabled
> group gets unit weight, the nondisabled are weighted according to which of
> the 30 nondisabled cells (strata?) they belong to. Applying the weights
> removes association between these factors and disability status.
>
> My central question is, given the above procedures, should I use the
> strata(stratvar) portion of the svyset option, in addition to
> [iweight=wtvar] and psu(psuvar).

Don is describing a multistage design:

Stage 1:	100 randomly samples PSUs
no stratification

Stage 2:	household selected within PSU
no extra stratification

Stage 3:	individuals screened/selected within household
stratified on gender, ethnicity, and age group

Suppose we have the following variables in a Stata dataset using the above
design:

psuid	-- identifies the PSUs
n_psu	-- stage 1 FPC (number of PSUs in the population)
houseid	-- identifies the households
n_hh	-- stage 2 FPC (number of households within PSU)
gender	-- identifies an individual's gender
eth	-- identifies an individual's ethnicity
agegrp	-- identifies an individual's age group
wgt	-- sampling weight

In Stata 8, -svyset- will only allow you to specify the design characteristics
for the first stage.  Here is how Don can -svyset- the data:

svyset [pw=wgt], psu(psuid)

Note that we do not include the Stage 1 FPC due to sampling within the PSUs.

In Stata 9, all three stages of this design can be specified using -svyset-.
First, generate the stage 3 strata variable:

egen strid = group(gender eth agegrp)

We can now use the new syntax of -svyset- to specify the survey design
characteristics for our dataset:

svyset psuid [pw=wgt], fpc(n_psu) || houseid, fpc(n_hh) ||
_n, strata(strid)

The double-or-bars "||" separate the survey design characteristics for each
stage:

Stage 1:	psuid [pw=wgt], fpc(n_psu)
Stage 2:	houseid, fpc(n_hh)
Stage 3:	_n, strata(strid)

where "_n" indicates that the individuals were sampled within -houseid-.

If Don does not have a variable like -n_psu-, meaning that the PSUs were
sampled with replacement or the first stage sampling fraction is small enough
to ignore, then only the sampling weights and PSUs are relevant.  In this case
the survey settings are

svyset psuid [pw=wgt]

--Jeff