Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Finite population correction with clustering of SE at a different level than the strata

From   Stas Kolenikov <>
Subject   Re: st: Finite population correction with clustering of SE at a different level than the strata
Date   Wed, 6 Jun 2012 07:42:43 -0500

On Mon, Jun 4, 2012 at 8:57 AM, Ole Dahl Rasmussen <> wrote:
> Dear Statalist,
> As part of a cluster randomized control trial, colleagues and I are doing stratified sampling and we're not sure if we're analyzing data correctly. Great if someone has suggestions.
> We have 46 villages. Before anything else, we went to all villages and asked them if they would be interested in participating in the project we were about to implement. We wrote down the names of the interested households on lists. We then stratified the population on village and interest: On household population lists we marked the interested households and randomly selected an absolute number, 24, of the interested and 14 on the non-interested in each village, 1750 household out of a total population of approximately 3000 households.  In the end we have a total of 92 interested/village combination, which we define as our stratas in the analysis. The sampling rate inside the stratas vary from 10% to 100%.
> Then we randomly selected 23 of the villages and implemented a project in these 23 villages.
> After two years, we surveyed everybody again.
> Finally, following Cameron/Trivedi p 817 in Microeconometrics and others, we estimate the following:
> svyset vid [pweight=weights], fpc(one) || _n, strata(strataID) fpc(f) singleunit(certainty)

This is a weird design specification. This is what it says:
1. your PSUs are identified by -vid-, but
2. they don't contribute any variance at the first stage, since the
fpc of 1 kills all variability
3. Then, at the next stage, you have a stratified SRSWOR sample of
observations, with strata given by -stataID- and fpc given by -f-. If
there are any strata where only one observation is being used,
disregard the contribution to variance from such strata.

In a sense, (2) indicates that this is sample is not generalizable to
any population; whether that is true or not depends on where the 46
initial villages came from. If they were sampled from a larger
population, then you would need to account for that in the first
stage. If you somehow got stuck with them based on what the national
government gave you, then it is indeed impossible to say how your
microfinance could work in the population as a whole beyond the sample
that you have. If you do care about correlations of the units within
villages (which is the advice you seem to be getting from empirical
economics literature: cluster as high as you can, then come up with a
justification as to why you have done so), you should omit the -fpc()-
option in the first stage and pretend you sampled these villages in
the first place.

Note that "stratum" is singular and "strata" are plural, so "stratas"
is a non-word.

---- Stas Kolenikov
---- Senior Survey Statistician, Abt SRBI
-- Opinions stated in this email are mine only, and do not reflect the
position of my employer

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index