Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stas Kolenikov <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: weight in a field survey |

Date |
Tue, 23 Mar 2010 11:11:59 -0500 |

On Mon, Mar 22, 2010 at 9:57 AM, Estelle PASQUIER <estelle_pasquier@yahoo.fr> wrote: > We are conducting a field survey on > malaria. Our population is sampled by stratifying it in rural and urban > settings (first step) then selecting villages with a probability proportional > to their population sizes. The last step consists in selecting randomly a fixed > number of households in the village. > I am a little bit concern with the sample > weights I have to choose with svy: my suggestion would be to calculate the > stratum weight as the ratio of total population on population in each stratum; > to calculate the cluster weight as the inverse of the sampling probability, and > then to use the product of these two weights as final weight. Am I right? Your stratum weight is the proportion of the population in a given stratum. Since you compute the overall mean as y-bar, overall = sum {over all population elements i} y_i/{total population size N} = sum {over h enumerating strata} sum {over units i in stratum h} y_hi / N = sum {over h enumerating strata} (stratum h size, N_h) times (mean in stratum h, y-bar_h) / N = sum {over h enumerating strata} (weight of stratum h) times y-bar_h your stratum weight is N_h/N. Remember, you are NOT sampling at the strata level. Now, when you DO sample (at the PSU and SSU levels), your weights are indeed inverse probabilities of selection. Hence the probability component of your weight for the household j in village i in stratum h is (1/probability of selection of the PSU) times (1/probability of selection of SSU) = (total estimates size of all PSUs in stratum h/estimated size of the sampled PSU hi) times (actual number of households in PSU h/# of households eventually sampled from PSU h) The discrepancy from the uniform weights, as noted by Steve, comes from discrepancies between the estimated and the actual size of the PSUs. If you had reasonably good preliminary data to base your design on, this won't have any tragic consequences to the variability of weights. To incorporate the non-response adjustments at this stage (cf Steve's suggestions on post-stratification), you can replace the second term by (actual number of households in PSU h/# of households eventually observed from PSU h). Your ultimate weight will be the product of: (i) stratum weight = proportion of strata in the overall population; (ii) probability weight, as above; (iii) non-response and poststratification adjustment correction, as suggested by Steve. If you have post-stratification information, you can also directly incorporate it with svyset , poststrata( ) postweight( ) options. That's always a better option than just incorporating the post-stratification weight into the sampling weight. > My second concern is to know if I there is > anything to do for the last step of this sampling design? Technically, if sampling is performed with replacement at any stage, you must not do anything in subsequent stages -- you get truly i.i.d. data. If your sampling fractions are small, you approximately don't have to do anything, and that's what is commonly done when sampling fractions are in single digit percentages. So a relatively flat design svyset PSU [pw=your weight computed as above], strata( rural/urban or whatever your stratification was) will give reasonably good variance estimates. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: weight in a field survey***From:*Estelle PASQUIER <estelle_pasquier@yahoo.fr>

- Prev by Date:
**st: passing extra information to function evaluator program nl** - Next by Date:
**Re: st: passing extra information to function evaluator program nl** - Previous by thread:
**Re: st: weight in a field survey** - Next by thread:
**st:survival data format** - Index(es):