Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: Finite population correction with clustering of SE at a different level than the strata

 From Ole Dahl Rasmussen To "statalist@hsphsun2.harvard.edu" Subject st: Finite population correction with clustering of SE at a different level than the strata Date Mon, 4 Jun 2012 13:57:08 +0000

```Dear Statalist,

As part of a cluster randomized control trial, colleagues and I are doing stratified sampling and we're not sure if we're analyzing data correctly. Great if someone has suggestions.

We have 46 villages. Before anything else, we went to all villages and asked them if they would be interested in participating in the project we were about to implement. We wrote down the names of the interested households on lists. We then stratified the population on village and interest: On household population lists we marked the interested households and randomly selected an absolute number, 24, of the interested and 14 on the non-interested in each village, 1750 household out of a total population of approximately 3000 households.  In the end we have a total of 92 interested/village combination, which we define as our stratas in the analysis. The sampling rate inside the stratas vary from 10% to 100%.

Then we randomly selected 23 of the villages and implemented a project in these 23 villages.

After two years, we surveyed everybody again.

Finally, following Cameron/Trivedi p 817 in Microeconometrics and others, we estimate the following:

svyset vid [pweight=weights], fpc(one) || _n, strata(strataID) fpc(f) singleunit(certainty)
svy: reg consumption treatXendline endline treat

where
- vid is and ID variable for villages, where I want clustered standard errors
- weights is the inverse probability of sampling
- one is a dummy that is equal to 1.
- consumption is a consumption measure
- treatXendline is the interaction between selection as treatment village and endline
- endline is an endline dummy
- treat is a treatment dummy
- weights is the inverse probability of sampling
- f is the total probability of sampling
- strataID is an ID variable for strata which is each of the 92 village/interested combinations.

So for the questions:
. Are we doing it right?
. In particular, is our finite population correction justified?
. We want to cluster standard errors at the village level, because we think this is the relevant level, i.e. not the strata level. Is this the right way of doing it?

Any suggestions and thoughts are appreciated.

On behalf of the team,
Ole Dahl Rasmussen
University of Southern Denmark

----
Ole Dahl Rasmussen
Rådgiver indenfor mikrofinans og evaluering. Ph.d.-studerende på Syddansk Universitet
Adviser on Evaluation and Microfinance. PhD-student at University of Southern Denmark

1165 København K / 1165 Copenhagen K
www.noedhjaelp.dk / www.danchurchaid.org

odr@dca.dk

P +45 33152800
M +45 29699145
SKYPE odrdca

VI TROR PÅ ET LIV FØR DØDEN

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```