Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: svyset, fpc command and stratified cluster sampling

From   Stas Kolenikov <>
Subject   Re: st: svyset, fpc command and stratified cluster sampling
Date   Fri, 12 Mar 2010 14:20:18 -0600

On Fri, Mar 12, 2010 at 12:56 AM, Pierre DeBeaudrap

> I plan to analyse data from a community survey with cluster sampling and I
> am not completely sure of the way to setup setsvy.
> The study design is a stratified two stage cluster sampling. The strata
> were
> urban and rural, then villages (psu) were sampled with a probability
> proportional to the number of households. In the second stage, 30
> households
> were randomly selected per village.
> We don't know the population size of the villages but we know the
> population
> size of each stratum and these figures are more reliable than the number of
> housholds.
> Svyset village [pw=proba], strata(urban) fpc( )
> For fpc do I have to enter the number of clusters per strata or the
> population size for each strata.

The finite population corrections are only applicable when you use SRS. This
is a fine print in complex survey data analysis that is often not
understood. Ideally, you'd want to construct Horvitz-Thompson estimator with
appropriate variance estimates computed with pairwise probabilities of
selection. Realistically, you'd have to assume sampling with replacement and
ignore fpc, or assume SRS (which is obviously a long stretch for your
design) and enter #of PSUs in the population for your fpc. # of villages is
probably fine for the rural part, and I would venture a guess that the urban
stratum was sampled in a different way.

Is there any further information I can provide for the level 2?

If you are willing to assume sampling with replacement, you don't need to
provide any additional information. It won't even be used in estimation.

Interestingly, you should not worry too much about the weights. If the
measure of size that your sampling people used (estimated # of households
per PSU) is at least approximately right, you will have a self-weighting
scheme that results in equal probabilities of selection. So your weights
should be approximately constant. That's a pretty good design, actually.

Stas Kolenikov, also found at
Small print: I use this email account for mailing lists only.

*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index