Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Stata svyset definition of strata?

From (Jeff Pitblado, StataCorp LP)
Subject   Re: st: Stata svyset definition of strata?
Date   Thu, 12 May 2005 15:32:25 -0500

Don Lloyd <> asks how to use -svyset-:

> I'd like to be sure I understand what Stata means by "strata" for the svyset
> functions. Our group selected a large community based random sample of
> disabled residents and a nondisabled control group in a Florida county. We
> randomly sampled 100 PSUs within the county; households within selected PSUs
> were then randomly selected and screened for disabled adults. We chose cases
> from the screened adult list such that equal proportions of men and women,
> and equal proportions of each ethnic/racial group, were included in the
> disabled target sample. For each disabled person, we then selected a similar
> but non-disabled person from within the same PSU, gender, and ethnic group,
> who was within 5 years of age. The achieved samples have similar, thought
> not identical, distributions on gender, ethnicity, and age. We've developed
> weights to adjust the distributions of nondisabled to match the disabled, in
> these three dimensions. Two disability statuses by two sexes by three age
> groups by five ethnic groups yield 60 strata -- is this the correct term
> with respect to Stata's svyset function? While everyone in the disabled
> group gets unit weight, the nondisabled are weighted according to which of
> the 30 nondisabled cells (strata?) they belong to. Applying the weights
> removes association between these factors and disability status.
> My central question is, given the above procedures, should I use the
> strata(stratvar) portion of the svyset option, in addition to
> [iweight=wtvar] and psu(psuvar).

Don is describing a multistage design:

	Stage 1:	100 randomly samples PSUs
			no stratification

	Stage 2:	household selected within PSU
			no extra stratification

	Stage 3:	individuals screened/selected within household
			stratified on gender, ethnicity, and age group

Suppose we have the following variables in a Stata dataset using the above

	psuid	-- identifies the PSUs
	n_psu	-- stage 1 FPC (number of PSUs in the population)
	houseid	-- identifies the households
	n_hh	-- stage 2 FPC (number of households within PSU)
	gender	-- identifies an individual's gender
	eth	-- identifies an individual's ethnicity
	agegrp	-- identifies an individual's age group
	wgt	-- sampling weight

In Stata 8, -svyset- will only allow you to specify the design characteristics
for the first stage.  Here is how Don can -svyset- the data:

	svyset [pw=wgt], psu(psuid)

Note that we do not include the Stage 1 FPC due to sampling within the PSUs.

In Stata 9, all three stages of this design can be specified using -svyset-.
First, generate the stage 3 strata variable:

	egen strid = group(gender eth agegrp)

We can now use the new syntax of -svyset- to specify the survey design
characteristics for our dataset:

	svyset psuid [pw=wgt], fpc(n_psu) || houseid, fpc(n_hh) ||
		_n, strata(strid)

The double-or-bars "||" separate the survey design characteristics for each

	Stage 1:	psuid [pw=wgt], fpc(n_psu)
	Stage 2:	houseid, fpc(n_hh)
	Stage 3:	_n, strata(strid)

where "_n" indicates that the individuals were sampled within -houseid-.

If Don does not have a variable like -n_psu-, meaning that the PSUs were
sampled with replacement or the first stage sampling fraction is small enough
to ignore, then only the sampling weights and PSUs are relevant.  In this case
the survey settings are

	svyset psuid [pw=wgt]

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index