Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: weight in a field survey

 From Stas Kolenikov To statalist@hsphsun2.harvard.edu Subject Re: st: weight in a field survey Date Tue, 23 Mar 2010 11:11:59 -0500

```On Mon, Mar 22, 2010 at 9:57 AM, Estelle PASQUIER
<estelle_pasquier@yahoo.fr> wrote:
> We are conducting a field survey on
> malaria. Our population is sampled by stratifying it in rural and urban
> settings (first step) then selecting villages with a probability proportional
> to their population sizes. The last step consists in selecting randomly a fixed
> number of households in the village.
> I am a little bit concern with the sample
> weights I have to choose with svy: my suggestion would be to calculate the
> stratum weight as the ratio of total population on population in each stratum;
> to calculate the cluster weight as the inverse of the sampling probability, and
> then to use the product of these two weights as final weight. Am I right?

Your stratum weight is the proportion of the population in a given
stratum. Since you compute the overall mean as

y-bar, overall
= sum {over all population elements i} y_i/{total population size N}
= sum {over h enumerating strata} sum {over units i in stratum h} y_hi / N
= sum {over h enumerating strata} (stratum h size, N_h) times (mean in
stratum h, y-bar_h) / N
= sum {over h enumerating strata} (weight of stratum h) times y-bar_h

your stratum weight is N_h/N. Remember, you are NOT sampling at the
strata level.

Now, when you DO sample (at the PSU and SSU levels), your weights are
indeed inverse probabilities of selection. Hence the probability
component of your weight for the household j in village i in stratum h
is

(1/probability of selection of the PSU) times (1/probability of
selection of SSU)
= (total estimates size of all PSUs in stratum h/estimated size of the
sampled PSU hi) times (actual number of households in PSU h/# of
households eventually sampled from PSU h)

The discrepancy from the uniform weights, as noted by Steve, comes
from discrepancies between the estimated and the actual size of the
PSUs. If you had reasonably good preliminary data to base your design
on, this won't have any tragic consequences to the variability of
weights. To incorporate the non-response adjustments at this stage (cf
Steve's suggestions on post-stratification), you can replace the
second term by (actual number of households in PSU h/# of households
eventually observed from PSU h).

Your ultimate weight will be the product of: (i) stratum weight =
proportion of strata in the overall population; (ii) probability
weight, as above; (iii) non-response and poststratification adjustment
correction, as suggested by Steve. If you have post-stratification
information, you can also directly incorporate it with

svyset , poststrata( ) postweight( )

options. That's always a better option than just incorporating the
post-stratification weight into the sampling weight.

> My second concern is to know if I there is
> anything to do for the last step of this sampling design?

Technically, if sampling is performed with replacement at any stage,
you must not do anything in subsequent stages -- you get truly i.i.d.
data. If your sampling fractions are small, you approximately don't
have to do anything, and that's what is commonly done when sampling
fractions are in single digit percentages. So a relatively flat design

svyset PSU [pw=your weight computed as above], strata( rural/urban or

will give reasonably good variance estimates.

--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```