Steve Samuels <sjsamuels@gmail.com>

statalist@hsphsun2.harvard.edu

Re: st: specifying SVYSET in household survey using multi-stage clustered sampling

Sun, 3 Oct 2010 16:35:25 -0400

Sorry. I sent an early draft of my reply and there are some remnants of sections that I later deleted (the references to "pseudo-strata"). Please use the following: Strata: create a new variable "my_stratum" 1. Every camp is a stratum For the refugees living in gatherings- 2. The gatherings in each region constitute a single stratum. Thus the number of strata will be H = no. of camps + no. of regions You will have to create a numbering scheme for strata that includes them both. Define the sampling units and fpcs In the camp strata, define psu = building ID fpc = estimated no. of buildings in the camp (If you listed individual households, than for "building" above, substitute "hh". ssu2= hh ID fpc2 = no. of HH in the building ssu3 = hh ID pfc3 = 1.0 In the region strata for gatherings define psu = gathering ID fpc = no. of gatherings in the region ssu2 = building ID fpc2 = no. of buildings in the gathering. ssu3 = hh ID fpc3 = no. of HH in a selected building (might be just 1) You need two -svyset- statements, one for estimating descriptive statistics (e.g.means, proportions), one for regressions and other tests of association. ****svyset for descriptive stats***************** svyset psu [pweight= weight], strata(my_stratum) singleunit(certainty) fpc(fpc) || ssu(ssu2) fpc(fpc2) ssu(ssu3) fpc(fpc3) ******************** The -svyset- for analytic statistics is the same as the previous one but omits the fpc's ****svyset for regression and tests***************** svyset psu [pweight= weight], strata(my_ stratum) singleunit(certainty) || ssu(ssu2) || ssu(ssu3) ******************** The incorrect degrees of freedom will probably not be much of a a problem for country-wide statistics, but could be for region-specific statistics. See E Korn and B Graubard (1999) Analysis of Health > Surveys, Wiley, NY, Section 5.2 (p 193), for some suggestions. > > > Best of luck, > > Steve > > Steven J. Samuels > sjsamuels@gmail.com > 18 Cantine's Island > Saugerties NY 12477 > USA > Voice: 845-246-0774 > Fax: 206-202-4783 > > > > On Sun, Oct 3, 2010 at 7:43 AM, Karin Seyfert <karin.seyfert@gmail.com> wrote: >> Dear Steve, >> >> hank you for taking the time! As for your questions: >> >> 1. That varies across region, generally 50-60% in camps and 40-50% in >> gatherings. This information has been provided by the agency >> responsible for the refugees. I compared them with NGO data were >> available and think they are good guesstimates. >> >> 2. In each region between two and six gatherings were selected. >> a. We select the first gathering with a probability proportionate to >> it's population. >> b. If the population of the gathering selected is less than half the >> region's gathering population, I select another gathering, otherwise I >> stop selecting gatherings. >> c. The second gathering is also selected with a probability >> proportionate to it's size (the population of the first gathering >> selected has been deducted from the gathering population of the entire >> region) >> 4. If the cumulative population in the two selected regions is less >> than half the country's total population, I select another region as >> described above, otherwise I stop selecting regions. >> >> 3. We sampled buildings from satellite images. The questionnaire >> contains information on how many HH live in each building sampled. >> More than one questionnaire could be administrated per building. >> >> 4. The weights are a separate issue. I am working with someone from >> the maths department here and did not want to clutter this email or >> the list with non-stata related problems. I will carry out the checks >> you recommended. >> >> Karin >> >> On Sat, Oct 2, 2010 at 10:24 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >>> Thanks Karin >>> >>> Some more questions and I think I can provide a workable -svyset- command >>> >>> 1. What proportions of the population (HH?) are inside and outside >>> camps? How did you know this? >>> 2. How many gatherings did you select for the sample? >>> 3. What was the sampling process for HH in the camps camps and in the >>> sampled gathering? It >> is assumed refugees live only in camps or gatherings. >> >> We collected individual information about each household member (age, >> education, employment etc.) but also aggregate information (household >> expenditure, household assets etc.). >> >> We hope to estimate descriptive proportions as well as carry out some >> analysis (i.e. what affects household income, or at the individual >> level, what 'predicts' health status) >> >> Best >> Karin >> >> On Fri, Oct 1, 2010 at 5:19 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >>> Karin, >>> >>> I found your description confusing. I want to reconstruct the survey >>> design in terms that I can understand, so I'll start with the basics. >>> Here's what I think you have done. Please correct me if I >>> misunderstand. >>> >>> 1) Your survey area is divided into regions >>> >>> 2) Every region had at least one camp. You selected all camps into >>> the study and took a sample of HH from each. >>> >>> 3) In all regions, refugees could also live in "gatherings" outside >>> camps. You selected a _sample_ of these gatherings in each region. >>> Within each selected gathering, you took a sample of HH. >>> >>> Question: did you also study refugees who lived neither in camps or gatherings? >>> >>> Question: within HH, did you obtain aggregate information, or >>> information about each member? >>> >>> You have stated that one purpose of the study is obtain estimates for >>> each region. Are these primarily estimates of descriptive statistics >>> (e.g. proportions?) >>> >>> Steve >>> >>> Steven J. Samuels >>> sjsamuels@gmail.com >>> 18 Cantine's Island >>> Saugerties NY 12477 >>> USA >>> Voice: 845-246-0774 >>> Fax: 206-202-4783 >>> >>> On Fri, Oct 1, 2010 at 2:22 AM, Karin Seyfert <karin.seyfert@gmail.com> wrote: >>>> Dear stata List, >>>> >>>> we have run a large household survey among refugees. >>>> >>>> Refugees live in clusters of camps or outside camp gatherings within >>>> several regions. >>>> >>>> We stratified our sample by 'camp' vs. 'outside camp gatherings' (1) >>>> and region (2). >>>> In strata (1) we under- and oversampled households to obtain robust >>>> regional estimates. >>>> Within strata (2), the camp/outside camp strata, we sampled households >>>> proportional to the share of households living inside or outside >>>> camps. >>>> >>>> We selected clusters within these two strata as follows: >>>> a) We selected all camps in all regions and >>>> b) a certain number of gatherings in all regions. Gatherings were >>>> selected with probabilities proportionate to their population within >>>> each region. They were sampled without replacement. >>>> >>>> Within the selected clusters, we used simple random sampling to select >>>> refugee households. Within each cluster we sampled about 5-10% of the >>>> population. Since we are unsure about exact camp/gathering populations >>>> and we sample a small share, we assume sampling with replacement. >>>> >>>> I do have sampling weights (inverse probability of a HH being >>>> selected) and have adjusted for over- and under-sampling within the >>>> regional strata (variable called 'weights'). Some strata contain a >>>> singleton SU (one region has only one camp), which we treat as >>>> certainty units. >>>> >>>> I am unsure how to specify -svyset-. Below is how I think the response >>>> to -svydes- should look like. Does it look correct? I would be >>>> grateful for help with the question marks below. pweight: weights
VCE: linearized
Single unit: certainty
Strata 1: camp/gathering
SU 1: ?
FPC 1: ?
Strata 2: regions
SU 2: households
FPC 2: number of households per region


I am sorry to take your time. I would really appreciate your help!
Please also correct any mistakes or inconsistencies in my reasoning.

Many Thanks
Karin Seyfert
PhD Candidate
School of Oriental and African Studies
University of London

