Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <[email protected]> |

To |
[email protected] |

Subject |
Re: st: specifying SVYSET in household survey using multi-stage clustered sampling |

Date |
Sun, 3 Oct 2010 16:35:25 -0400 |

Sorry. I sent an early draft of my reply and there are some remnants of sections that I later deleted (the references to "pseudo-strata"). Please use the following: Strata: create a new variable "my_stratum" 1. Every camp is a stratum For the refugees living in gatherings- 2. The gatherings in each region constitute a single stratum. Thus the number of strata will be H = no. of camps + no. of regions You will have to create a numbering scheme for strata that includes them both. Define the sampling units and fpcs In the camp strata, define psu = building ID fpc = estimated no. of buildings in the camp (If you listed individual households, than for "building" above, substitute "hh". ssu2= hh ID fpc2 = no. of HH in the building ssu3 = hh ID pfc3 = 1.0 In the region strata for gatherings define psu = gathering ID fpc = no. of gatherings in the region ssu2 = building ID fpc2 = no. of buildings in the gathering. ssu3 = hh ID fpc3 = no. of HH in a selected building (might be just 1) You need two -svyset- statements, one for estimating descriptive statistics (e.g.means, proportions), one for regressions and other tests of association. ****svyset for descriptive stats***************** svyset psu [pweight= weight], strata(my_stratum) singleunit(certainty) fpc(fpc) || ssu(ssu2) fpc(fpc2) ssu(ssu3) fpc(fpc3) ******************** The -svyset- for analytic statistics is the same as the previous one but omits the fpc's ****svyset for regression and tests***************** svyset psu [pweight= weight], strata(my_ stratum) singleunit(certainty) || ssu(ssu2) || ssu(ssu3) ******************** The incorrect degrees of freedom will probably not be much of a a problem for country-wide statistics, but could be for region-specific statistics. See E Korn and B Graubard (1999) Analysis of Health Surveys, Wiley, NY, Section 5.2 (p 193), for some suggestions. On Sun, Oct 3, 2010 at 11:06 AM, Steve Samuels <[email protected]> wrote: > Hello, Karin. > > I think you need to stop calling the gathering strata "regions", and > call them the "gatherings population in in each region" or just the > "gathering strata". "Regions" (camps + gatherings) define an analysis > unit. > > Create two data sets > households -for analysis of hh outcomes and statistics > Individuals: for analysis of individual outcomes & statistics. > > The same -svyset- statements (below) should work for each. > > These kinds of designs, which mingle two different sizes of PSUs, > households in the camps and gatherings in the remainder of the each > region, are difficult to set up and analyze. The main problem is that > the small number of gatherings sampled in each region gives poor > estimates of variability of and degrees of freedom (df). I'm going to > give you a liberal set up, which will give incorrect degrees of > freedom and give a reference to the problem at the end. > > Strata: create a new variable "my_stratum" > 1. Every camp is a stratum > > For the refugees living in gatherings- > 2. The gatherings in each region constitute a single stratum. > > Thus the number of strata will be > H = no. of camps + no. of regions > > You will have to create a numbering scheme for strata that includes them both. > > Define the sampling units and fpcs > In the camp strata, define > psu = building ID > fpc = estimated no. of buildings in the camp > (If you listed individual households, than for "building" above, > substitute "hh". > ssu2= hh ID > fpc2 = no. of HH in the building > ssu3 = hh ID > pfc3 = 1.0 > > In the region strata for gatherings define > psu = gathering ID > fpc = no. of gatherings in the region > (alternatively, if gatherings in the region differ greatly in size: > the proportion of the region gathering population in the selected > gatherings, but there is little theory to justify this.) > ssu2 = building ID > fpc2 = no. of buildings in the gathering. > ssu3 = hh ID > fpc3 = no. of HH in a selected building (might be just 1) > > You need two -svyset- statements, one for estimating descriptive > statistics (e.g.means, proportions), one for regressions and other > tests of association. > > ****svyset for descriptive stats***************** > svyset psu [pweight= weight], strata(my_stratum) > singleunit(certainty) fpc(fpc) || ssu(ssu2) fpc(fpc2) > ssu(ssu3) fpc(fpc3) > ******************** > > The -svyset- for analytic statistics is the same as the previous one > but omits the fpc's > > ****svyset for regression and tests***************** > svyset psu [pweight= weight], strata(my_ stratum) > singleunit(certainty) || ssu(ssu2) || ssu(ssu3) > ******************** > > The incorrect degrees of freedom will probably not be much of a a > problem for country-wide statistics, but could be for region-specific > statistics. See E Korn and B Graubard (1999) Analysis of Health > Surveys, Wiley, NY, Section 5.2 (p 193), for some suggestions. > > > Best of luck, > > Steve > > Steven J. Samuels > [email protected] > 18 Cantine's Island > Saugerties NY 12477 > USA > Voice: 845-246-0774 > Fax: 206-202-4783 > > > > On Sun, Oct 3, 2010 at 7:43 AM, Karin Seyfert <[email protected]> wrote: >> Dear Steve, >> >> hank you for taking the time! As for your questions: >> >> 1. That varies across region, generally 50-60% in camps and 40-50% in >> gatherings. This information has been provided by the agency >> responsible for the refugees. I compared them with NGO data were >> available and think they are good guesstimates. >> >> 2. In each region between two and six gatherings were selected. >> a. We select the first gathering with a probability proportionate to >> it's population. >> b. If the population of the gathering selected is less than half the >> region's gathering population, I select another gathering, otherwise I >> stop selecting gatherings. >> c. The second gathering is also selected with a probability >> proportionate to it's size (the population of the first gathering >> selected has been deducted from the gathering population of the entire >> region) >> 4. If the cumulative population in the two selected regions is less >> than half the country's total population, I select another region as >> described above, otherwise I stop selecting regions. >> >> 3. We sampled buildings from satellite images. The questionnaire >> contains information on how many HH live in each building sampled. >> More than one questionnaire could be administrated per building. >> >> 4. The weights are a separate issue. I am working with someone from >> the maths department here and did not want to clutter this email or >> the list with non-stata related problems. I will carry out the checks >> you recommended. >> >> Karin >> >> On Sat, Oct 2, 2010 at 10:24 PM, Steve Samuels <[email protected]> wrote: >>> Thanks Karin >>> >>> Some more questions and I think I can provide a workable -svyset- command >>> >>> 1. What proportions of the population (HH?) are inside and outside >>> camps? How did you know this? >>> 2. How many gatherings did you select for the sample? >>> 3. What was the sampling process for HH in the camps camps and in the >>> sampled gathering? I'm guessing that you listed all of them first. >>> >>> Not needed to do -svyset-, but important: >>> >>> Have you checked to see if the sum of the HH weights in the sample is >>> close to the known number of HH for the sample and that this is true >>> separately inside and outside the camps and for each region? >>> >>> Steve >>> >> :24 PM, Steve Samuels <[email protected]> wrote: >>> Thanks Karin >>> >>> Some more questions and I think I can provide a workable -svyset- command >>> >>> 1. What proportions of the population (HH?) are inside and outside >>> camps? How did you know this? >>> 2. How many gatherings did you select for the sample? >>> 3. What was the sampling process for HH in the camps camps and in the >>> sampled gathering? I'm guessing that you listed all of them first. >>> >>> Not needed to do -svyset-, but important: >>> >>> Have you checked to see if the sum of the HH weights in the sample is >>> close to the known number of HH for the sample and that this is true >>> separately inside and outside the camps and for each region? >>> >>> Steve >>> > > On Fri, Oct 1, 2010 at 11:33 AM, Karin Seyfert <[email protected]> wrote: >> -- >> Dear Steve, >> >> Thank you so much for your quick reply. I am sorry if I was confusing, >> but you have re-formulated the survey design correctly and much more >> clearly. >> >> As for your questions: >> >> We did not study refugees living in neither camps nor gatherings. It >> is assumed refugees live only in camps or gatherings. >> >> We collected individual information about each household member (age, >> education, employment etc.) but also aggregate information (household >> expenditure, household assets etc.). >> >> We hope to estimate descriptive proportions as well as carry out some >> analysis (i.e. what affects household income, or at the individual >> level, what 'predicts' health status) >> >> Best >> Karin >> >> On Fri, Oct 1, 2010 at 5:19 PM, Steve Samuels <[email protected]> wrote: >>> Karin, >>> >>> I found your description confusing. I want to reconstruct the survey >>> design in terms that I can understand, so I'll start with the basics. >>> Here's what I think you have done. Please correct me if I >>> misunderstand. >>> >>> 1) Your survey area is divided into regions >>> >>> 2) Every region had at least one camp. You selected all camps into >>> the study and took a sample of HH from each. >>> >>> 3) In all regions, refugees could also live in "gatherings" outside >>> camps. You selected a _sample_ of these gatherings in each region. >>> Within each selected gathering, you took a sample of HH. >>> >>> Question: did you also study refugees who lived neither in camps or gatherings? >>> >>> Question: within HH, did you obtain aggregate information, or >>> information about each member? >>> >>> You have stated that one purpose of the study is obtain estimates for >>> each region. Are these primarily estimates of descriptive statistics >>> (e.g. proportions?) >>> >>> Steve >>> >>> Steven J. Samuels >>> [email protected] >>> 18 Cantine's Island >>> Saugerties NY 12477 >>> USA >>> Voice: 845-246-0774 >>> Fax: 206-202-4783 >>> >>> On Fri, Oct 1, 2010 at 2:22 AM, Karin Seyfert <[email protected]> wrote: >>>> Dear stata List, >>>> >>>> we have run a large household survey among refugees. >>>> >>>> Refugees live in clusters of camps or outside camp gatherings within >>>> several regions. >>>> >>>> We stratified our sample by 'camp' vs. 'outside camp gatherings' (1) >>>> and region (2). >>>> In strata (1) we under- and oversampled households to obtain robust >>>> regional estimates. >>>> Within strata (2), the camp/outside camp strata, we sampled households >>>> proportional to the share of households living inside or outside >>>> camps. >>>> >>>> We selected clusters within these two strata as follows: >>>> a) We selected all camps in all regions and >>>> b) a certain number of gatherings in all regions. Gatherings were >>>> selected with probabilities proportionate to their population within >>>> each region. They were sampled without replacement. >>>> >>>> Within the selected clusters, we used simple random sampling to select >>>> refugee households. Within each cluster we sampled about 5-10% of the >>>> population. Since we are unsure about exact camp/gathering populations >>>> and we sample a small share, we assume sampling with replacement. >>>> >>>> I do have sampling weights (inverse probability of a HH being >>>> selected) and have adjusted for over- and under-sampling within the >>>> regional strata (variable called 'weights'). Some strata contain a >>>> singleton SU (one region has only one camp), which we treat as >>>> certainty units. >>>> >>>> I am unsure how to specify -svyset-. Below is how I think the response >>>> to -svydes- should look like. Does it look correct? I would be >>>> grateful for help with the question marks below. I am also unsure what >>>> to specify as PSU, households or clusters? >>>> >>>> pweight: weights >>>> VCE: linearized >>>> Single unit: certainty >>>> Strata 1: camp/gathering >>>> SU 1: ? >>>> FPC 1: ? >>>> Strata 2: regions >>>> SU 2: households >>>> FPC 2: number of households per region >>>> >>>> >>>> I am sorry to take your time. I would really appreciate your help! >>>> Please also correct any mistakes or inconsistencies in my reasoning. >>>> >>>> Many Thanks >>>> Karin Seyfert >>>> PhD Candidate >>>> School of Oriental and African Studies >>>> University of London >>>> >> > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:

**References**:**st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Karin Seyfert <[email protected]>

**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Steve Samuels <[email protected]>

**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Karin Seyfert <[email protected]>

**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Steve Samuels <[email protected]>

**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Karin Seyfert <[email protected]>

**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Steve Samuels <[email protected]>

- Prev by Date:
**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling** - Next by Date:
**re: st: Handbook on impact evaluation with Stata examples** - Previous by thread:
**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling** - Next by thread:
**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling** - Index(es):