Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: specifying SVYSET in household survey using multi-stage clustered sampling |

Date |
Sun, 3 Oct 2010 11:06:20 -0400 |

Hello, Karin. I think you need to stop calling the gathering strata "regions", and call them the "gatherings population in in each region" or just the "gathering strata". "Regions" (camps + gatherings) define an analysis unit. Create two data sets households -for analysis of hh outcomes and statistics Individuals: for analysis of individual outcomes & statistics. The same -svyset- statements (below) should work for each. These kinds of designs, which mingle two different sizes of PSUs, households in the camps and gatherings in the remainder of the each region, are difficult to set up and analyze. The main problem is that the small number of gatherings sampled in each region gives poor estimates of variability of and degrees of freedom (df). I'm going to give you a liberal set up, which will give incorrect degrees of freedom and give a reference to the problem at the end. Strata: create a new variable "my_stratum" 1. Every camp is a stratum For the refugees living in gatherings- 2. The gatherings in each region constitute a single stratum. Thus the number of strata will be H = no. of camps + no. of regions You will have to create a numbering scheme for strata that includes them both. Define the sampling units and fpcs In the camp strata, define psu = building ID fpc = estimated no. of buildings in the camp (If you listed individual households, than for "building" above, substitute "hh". ssu2= hh ID fpc2 = no. of HH in the building ssu3 = hh ID pfc3 = 1.0 In the region strata for gatherings define psu = (pseudo-)gathering ID fpc = no. of (pseudo-) gatherings in the region (alternatively, if gatherings in the region differ greatly in size: the proportion of the region gathering population in the selected gatherings, but there is little theory to justify this.) ssu2 = building ID fpc2 = no. of buildings in the gathering. ssu3 = hh ID fpc3 = no. of HH in a selected building (might be just 1) You need two -svyset- statements, one for estimating descriptive statistics (e.g.means, proportions), one for regressions and other tests of association. ****svyset for descriptive stats***************** svyset psu [pweight= weight], strata(my_stratum) singleunit(certainty) fpc(fpc) || ssu(ssu2) fpc(fpc2) ssu(ssu3) fpc(fpc3) ******************** The -svyset- for analytic statistics is the same as the previous one but omits the fpc's ****svyset for regression and tests***************** svyset psu [pweight= weight], strata(my_ stratum) singleunit(certainty) || ssu(ssu2) || ssu(ssu3) ******************** The incorrect degrees of freedom will probably not be much of a a problem for country-wide statistics, but could be for region-specific statistics. See E Korn and B Graubard (1999) Analysis of Health Surveys, Wiley, NY, Section 5.2 (p 193), for some suggestions. Best of luck, Steve Steven J. Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 On Sun, Oct 3, 2010 at 7:43 AM, Karin Seyfert <karin.seyfert@gmail.com> wrote: > Dear Steve, > > hank you for taking the time! As for your questions: > > 1. That varies across region, generally 50-60% in camps and 40-50% in > gatherings. This information has been provided by the agency > responsible for the refugees. I compared them with NGO data were > available and think they are good guesstimates. > > 2. In each region between two and six gatherings were selected. > a. We select the first gathering with a probability proportionate to > it's population. > b. If the population of the gathering selected is less than half the > region's gathering population, I select another gathering, otherwise I > stop selecting gatherings. > c. The second gathering is also selected with a probability > proportionate to it's size (the population of the first gathering > selected has been deducted from the gathering population of the entire > region) > 4. If the cumulative population in the two selected regions is less > than half the country's total population, I select another region as > described above, otherwise I stop selecting regions. > > 3. We sampled buildings from satellite images. The questionnaire > contains information on how many HH live in each building sampled. > More than one questionnaire could be administrated per building. > > 4. The weights are a separate issue. I am working with someone from > the maths department here and did not want to clutter this email or > the list with non-stata related problems. I will carry out the checks > you recommended. > > Karin > > On Sat, Oct 2, 2010 at 10:24 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >> Thanks Karin >> >> Some more questions and I think I can provide a workable -svyset- command >> >> 1. What proportions of the population (HH?) are inside and outside >> camps? How did you know this? >> 2. How many gatherings did you select for the sample? >> 3. What was the sampling process for HH in the camps camps and in the >> sampled gathering? I'm guessing that you listed all of them first. >> >> Not needed to do -svyset-, but important: >> >> Have you checked to see if the sum of the HH weights in the sample is >> close to the known number of HH for the sample and that this is true >> separately inside and outside the camps and for each region? >> >> Steve >> > :24 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >> Thanks Karin >> >> Some more questions and I think I can provide a workable -svyset- command >> >> 1. What proportions of the population (HH?) are inside and outside >> camps? How did you know this? >> 2. How many gatherings did you select for the sample? >> 3. What was the sampling process for HH in the camps camps and in the >> sampled gathering? I'm guessing that you listed all of them first. >> >> Not needed to do -svyset-, but important: >> >> Have you checked to see if the sum of the HH weights in the sample is >> close to the known number of HH for the sample and that this is true >> separately inside and outside the camps and for each region? >> >> Steve >> On Fri, Oct 1, 2010 at 11:33 AM, Karin Seyfert <karin.seyfert@gmail.com> wrote: > -- > Dear Steve, > > Thank you so much for your quick reply. I am sorry if I was confusing, > but you have re-formulated the survey design correctly and much more > clearly. > > As for your questions: > > We did not study refugees living in neither camps nor gatherings. It > is assumed refugees live only in camps or gatherings. > > We collected individual information about each household member (age, > education, employment etc.) but also aggregate information (household > expenditure, household assets etc.). > > We hope to estimate descriptive proportions as well as carry out some > analysis (i.e. what affects household income, or at the individual > level, what 'predicts' health status) > > Best > Karin > > On Fri, Oct 1, 2010 at 5:19 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >> Karin, >> >> I found your description confusing. I want to reconstruct the survey >> design in terms that I can understand, so I'll start with the basics. >> Here's what I think you have done. Please correct me if I >> misunderstand. >> >> 1) Your survey area is divided into regions >> >> 2) Every region had at least one camp. You selected all camps into >> the study and took a sample of HH from each. >> >> 3) In all regions, refugees could also live in "gatherings" outside >> camps. You selected a _sample_ of these gatherings in each region. >> Within each selected gathering, you took a sample of HH. >> >> Question: did you also study refugees who lived neither in camps or gatherings? >> >> Question: within HH, did you obtain aggregate information, or >> information about each member? >> >> You have stated that one purpose of the study is obtain estimates for >> each region. Are these primarily estimates of descriptive statistics >> (e.g. proportions?) >> >> Steve >> >> Steven J. Samuels >> sjsamuels@gmail.com >> 18 Cantine's Island >> Saugerties NY 12477 >> USA >> Voice: 845-246-0774 >> Fax: 206-202-4783 >> >> On Fri, Oct 1, 2010 at 2:22 AM, Karin Seyfert <karin.seyfert@gmail.com> wrote: >>> Dear stata List, >>> >>> we have run a large household survey among refugees. >>> >>> Refugees live in clusters of camps or outside camp gatherings within >>> several regions. >>> >>> We stratified our sample by 'camp' vs. 'outside camp gatherings' (1) >>> and region (2). >>> In strata (1) we under- and oversampled households to obtain robust >>> regional estimates. >>> Within strata (2), the camp/outside camp strata, we sampled households >>> proportional to the share of households living inside or outside >>> camps. >>> >>> We selected clusters within these two strata as follows: >>> a) We selected all camps in all regions and >>> b) a certain number of gatherings in all regions. Gatherings were >>> selected with probabilities proportionate to their population within >>> each region. They were sampled without replacement. >>> >>> Within the selected clusters, we used simple random sampling to select >>> refugee households. Within each cluster we sampled about 5-10% of the >>> population. Since we are unsure about exact camp/gathering populations >>> and we sample a small share, we assume sampling with replacement. >>> >>> I do have sampling weights (inverse probability of a HH being >>> selected) and have adjusted for over- and under-sampling within the >>> regional strata (variable called 'weights'). Some strata contain a >>> singleton SU (one region has only one camp), which we treat as >>> certainty units. >>> >>> I am unsure how to specify -svyset-. Below is how I think the response >>> to -svydes- should look like. Does it look correct? I would be >>> grateful for help with the question marks below. I am also unsure what >>> to specify as PSU, households or clusters? >>> >>> pweight: weights >>> VCE: linearized >>> Single unit: certainty >>> Strata 1: camp/gathering >>> SU 1: ? >>> FPC 1: ? >>> Strata 2: regions >>> SU 2: households >>> FPC 2: number of households per region >>> >>> >>> I am sorry to take your time. I would really appreciate your help! >>> Please also correct any mistakes or inconsistencies in my reasoning. >>> >>> Many Thanks >>> Karin Seyfert >>> PhD Candidate >>> School of Oriental and African Studies >>> University of London >>> > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Steve Samuels <sjsamuels@gmail.com>

**References**:**st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Karin Seyfert <karin.seyfert@gmail.com>

**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Karin Seyfert <karin.seyfert@gmail.com>

**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Karin Seyfert <karin.seyfert@gmail.com>

- Prev by Date:
**st: Shea's R2 with xtivreg2** - Next by Date:
**st: RE: Shea's R2 with xtivreg2** - Previous by thread:
**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling** - Next by thread:
**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling** - Index(es):