Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: specifying SVYSET in household survey using multi-stage clustered sampling |

Date |
Sun, 3 Oct 2010 16:36:24 -0400 |

Actually the references were to "pseudo-gatherings". S. On Sun, Oct 3, 2010 at 4:35 PM, Steve Samuels <sjsamuels@gmail.com> wrote: > Sorry. I sent an early draft of my reply and there are some remnants > of sections that I later deleted (the references to "pseudo-strata"). > Please use the following: > > > > Strata: create a new variable "my_stratum" > 1. Every camp is a stratum > > For the refugees living in gatherings- > 2. The gatherings in each region constitute a single stratum. > > Thus the number of strata will be > H = no. of camps + no. of regions > > You will have to create a numbering scheme for strata that includes them both. > > Define the sampling units and fpcs > In the camp strata, define > psu = building ID > fpc = estimated no. of buildings in the camp > (If you listed individual households, than for "building" above, > substitute "hh". > ssu2= hh ID > fpc2 = no. of HH in the building > ssu3 = hh ID > pfc3 = 1.0 > > In the region strata for gatherings define > psu = gathering ID > fpc = no. of gatherings in the region > ssu2 = building ID > fpc2 = no. of buildings in the gathering. > ssu3 = hh ID > fpc3 = no. of HH in a selected building (might be just 1) > > You need two -svyset- statements, one for estimating descriptive > statistics (e.g.means, proportions), one for regressions and other > tests of association. > > ****svyset for descriptive stats***************** > svyset psu [pweight= weight], strata(my_stratum) > singleunit(certainty) fpc(fpc) || ssu(ssu2) fpc(fpc2) > ssu(ssu3) fpc(fpc3) > ******************** > > The -svyset- for analytic statistics is the same as the previous one > but omits the fpc's > > ****svyset for regression and tests***************** > svyset psu [pweight= weight], strata(my_ stratum) > singleunit(certainty) || ssu(ssu2) || ssu(ssu3) > ******************** > > The incorrect degrees of freedom will probably not be much of a a > problem for country-wide statistics, but could be for region-specific > statistics. See E Korn and B Graubard (1999) Analysis of Health > Surveys, Wiley, NY, Section 5.2 (p 193), for some suggestions. > > > On Sun, Oct 3, 2010 at 11:06 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >> Hello, Karin. >> >> I think you need to stop calling the gathering strata "regions", and >> call them the "gatherings population in in each region" or just the >> "gathering strata". "Regions" (camps + gatherings) define an analysis >> unit. >> >> Create two data sets >> households -for analysis of hh outcomes and statistics >> Individuals: for analysis of individual outcomes & statistics. >> >> The same -svyset- statements (below) should work for each. >> >> These kinds of designs, which mingle two different sizes of PSUs, >> households in the camps and gatherings in the remainder of the each >> region, are difficult to set up and analyze. The main problem is that >> the small number of gatherings sampled in each region gives poor >> estimates of variability of and degrees of freedom (df). I'm going to >> give you a liberal set up, which will give incorrect degrees of >> freedom and give a reference to the problem at the end. >> >> Strata: create a new variable "my_stratum" >> 1. Every camp is a stratum >> >> For the refugees living in gatherings- >> 2. The gatherings in each region constitute a single stratum. >> >> Thus the number of strata will be >> H = no. of camps + no. of regions >> >> You will have to create a numbering scheme for strata that includes them both. >> >> Define the sampling units and fpcs >> In the camp strata, define >> psu = building ID >> fpc = estimated no. of buildings in the camp >> (If you listed individual households, than for "building" above, >> substitute "hh". >> ssu2= hh ID >> fpc2 = no. of HH in the building >> ssu3 = hh ID >> pfc3 = 1.0 >> >> In the region strata for gatherings define >> psu = gathering ID >> fpc = no. of gatherings in the region >> (alternatively, if gatherings in the region differ greatly in size: >> the proportion of the region gathering population in the selected >> gatherings, but there is little theory to justify this.) >> ssu2 = building ID >> fpc2 = no. of buildings in the gathering. >> ssu3 = hh ID >> fpc3 = no. of HH in a selected building (might be just 1) >> >> You need two -svyset- statements, one for estimating descriptive >> statistics (e.g.means, proportions), one for regressions and other >> tests of association. >> >> ****svyset for descriptive stats***************** >> svyset psu [pweight= weight], strata(my_stratum) >> singleunit(certainty) fpc(fpc) || ssu(ssu2) fpc(fpc2) >> ssu(ssu3) fpc(fpc3) >> ******************** >> >> The -svyset- for analytic statistics is the same as the previous one >> but omits the fpc's >> >> ****svyset for regression and tests***************** >> svyset psu [pweight= weight], strata(my_ stratum) >> singleunit(certainty) || ssu(ssu2) || ssu(ssu3) >> ******************** >> >> The incorrect degrees of freedom will probably not be much of a a >> problem for country-wide statistics, but could be for region-specific >> statistics. See E Korn and B Graubard (1999) Analysis of Health >> Surveys, Wiley, NY, Section 5.2 (p 193), for some suggestions. >> >> >> Best of luck, >> >> Steve >> >> Steven J. Samuels >> sjsamuels@gmail.com >> 18 Cantine's Island >> Saugerties NY 12477 >> USA >> Voice: 845-246-0774 >> Fax: 206-202-4783 >> >> >> >> On Sun, Oct 3, 2010 at 7:43 AM, Karin Seyfert <karin.seyfert@gmail.com> wrote: >>> Dear Steve, >>> >>> hank you for taking the time! As for your questions: >>> >>> 1. That varies across region, generally 50-60% in camps and 40-50% in >>> gatherings. This information has been provided by the agency >>> responsible for the refugees. I compared them with NGO data were >>> available and think they are good guesstimates. >>> >>> 2. In each region between two and six gatherings were selected. >>> a. We select the first gathering with a probability proportionate to >>> it's population. >>> b. If the population of the gathering selected is less than half the >>> region's gathering population, I select another gathering, otherwise I >>> stop selecting gatherings. >>> c. The second gathering is also selected with a probability >>> proportionate to it's size (the population of the first gathering >>> selected has been deducted from the gathering population of the entire >>> region) >>> 4. If the cumulative population in the two selected regions is less >>> than half the country's total population, I select another region as >>> described above, otherwise I stop selecting regions. >>> >>> 3. We sampled buildings from satellite images. The questionnaire >>> contains information on how many HH live in each building sampled. >>> More than one questionnaire could be administrated per building. >>> >>> 4. The weights are a separate issue. I am working with someone from >>> the maths department here and did not want to clutter this email or >>> the list with non-stata related problems. I will carry out the checks >>> you recommended. >>> >>> Karin >>> >>> On Sat, Oct 2, 2010 at 10:24 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>> Thanks Karin >>>> >>>> Some more questions and I think I can provide a workable -svyset- command >>>> >>>> 1. What proportions of the population (HH?) are inside and outside >>>> camps? How did you know this? >>>> 2. How many gatherings did you select for the sample? >>>> 3. What was the sampling process for HH in the camps camps and in the >>>> sampled gathering? I'm guessing that you listed all of them first. >>>> >>>> Not needed to do -svyset-, but important: >>>> >>>> Have you checked to see if the sum of the HH weights in the sample is >>>> close to the known number of HH for the sample and that this is true >>>> separately inside and outside the camps and for each region? >>>> >>>> Steve >>>> >>> :24 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>> Thanks Karin >>>> >>>> Some more questions and I think I can provide a workable -svyset- command >>>> >>>> 1. What proportions of the population (HH?) are inside and outside >>>> camps? How did you know this? >>>> 2. How many gatherings did you select for the sample? >>>> 3. What was the sampling process for HH in the camps camps and in the >>>> sampled gathering? I'm guessing that you listed all of them first. >>>> >>>> Not needed to do -svyset-, but important: >>>> >>>> Have you checked to see if the sum of the HH weights in the sample is >>>> close to the known number of HH for the sample and that this is true >>>> separately inside and outside the camps and for each region? >>>> >>>> Steve >>>> >> >> On Fri, Oct 1, 2010 at 11:33 AM, Karin Seyfert <karin.seyfert@gmail.com> wrote: >>> -- >>> Dear Steve, >>> >>> Thank you so much for your quick reply. I am sorry if I was confusing, >>> but you have re-formulated the survey design correctly and much more >>> clearly. >>> >>> As for your questions: >>> >>> We did not study refugees living in neither camps nor gatherings. It >>> is assumed refugees live only in camps or gatherings. >>> >>> We collected individual information about each household member (age, >>> education, employment etc.) but also aggregate information (household >>> expenditure, household assets etc.). >>> >>> We hope to estimate descriptive proportions as well as carry out some >>> analysis (i.e. what affects household income, or at the individual >>> level, what 'predicts' health status) >>> >>> Best >>> Karin >>> >>> On Fri, Oct 1, 2010 at 5:19 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>> Karin, >>>> >>>> I found your description confusing. I want to reconstruct the survey >>>> design in terms that I can understand, so I'll start with the basics. >>>> Here's what I think you have done. Please correct me if I >>>> misunderstand. >>>> >>>> 1) Your survey area is divided into regions >>>> >>>> 2) Every region had at least one camp. You selected all camps into >>>> the study and took a sample of HH from each. >>>> >>>> 3) In all regions, refugees could also live in "gatherings" outside >>>> camps. You selected a _sample_ of these gatherings in each region. >>>> Within each selected gathering, you took a sample of HH. >>>> >>>> Question: did you also study refugees who lived neither in camps or gatherings? >>>> >>>> Question: within HH, did you obtain aggregate information, or >>>> information about each member? >>>> >>>> You have stated that one purpose of the study is obtain estimates for >>>> each region. Are these primarily estimates of descriptive statistics >>>> (e.g. proportions?) >>>> >>>> Steve >>>> >>>> Steven J. Samuels >>>> sjsamuels@gmail.com >>>> 18 Cantine's Island >>>> Saugerties NY 12477 >>>> USA >>>> Voice: 845-246-0774 >>>> Fax: 206-202-4783 >>>> >>>> On Fri, Oct 1, 2010 at 2:22 AM, Karin Seyfert <karin.seyfert@gmail.com> wrote: >>>>> Dear stata List, >>>>> >>>>> we have run a large household survey among refugees. >>>>> >>>>> Refugees live in clusters of camps or outside camp gatherings within >>>>> several regions. >>>>> >>>>> We stratified our sample by 'camp' vs. 'outside camp gatherings' (1) >>>>> and region (2). >>>>> In strata (1) we under- and oversampled households to obtain robust >>>>> regional estimates. >>>>> Within strata (2), the camp/outside camp strata, we sampled households >>>>> proportional to the share of households living inside or outside >>>>> camps. >>>>> >>>>> We selected clusters within these two strata as follows: >>>>> a) We selected all camps in all regions and >>>>> b) a certain number of gatherings in all regions. Gatherings were >>>>> selected with probabilities proportionate to their population within >>>>> each region. They were sampled without replacement. >>>>> >>>>> Within the selected clusters, we used simple random sampling to select >>>>> refugee households. Within each cluster we sampled about 5-10% of the >>>>> population. Since we are unsure about exact camp/gathering populations >>>>> and we sample a small share, we assume sampling with replacement. >>>>> >>>>> I do have sampling weights (inverse probability of a HH being >>>>> selected) and have adjusted for over- and under-sampling within the >>>>> regional strata (variable called 'weights'). Some strata contain a >>>>> singleton SU (one region has only one camp), which we treat as >>>>> certainty units. >>>>> >>>>> I am unsure how to specify -svyset-. Below is how I think the response >>>>> to -svydes- should look like. Does it look correct? I would be >>>>> grateful for help with the question marks below. I am also unsure what >>>>> to specify as PSU, households or clusters? >>>>> >>>>> pweight: weights >>>>> VCE: linearized >>>>> Single unit: certainty >>>>> Strata 1: camp/gathering >>>>> SU 1: ? >>>>> FPC 1: ? >>>>> Strata 2: regions >>>>> SU 2: households >>>>> FPC 2: number of households per region >>>>> >>>>> >>>>> I am sorry to take your time. I would really appreciate your help! >>>>> Please also correct any mistakes or inconsistencies in my reasoning. >>>>> >>>>> Many Thanks >>>>> Karin Seyfert >>>>> PhD Candidate >>>>> School of Oriental and African Studies >>>>> University of London >>>>> >>> >> > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Karin Seyfert <karin.seyfert@gmail.com>

**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Karin Seyfert <karin.seyfert@gmail.com>

**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Karin Seyfert <karin.seyfert@gmail.com>

**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling***From:*Steve Samuels <sjsamuels@gmail.com>

- Prev by Date:
**Re: st: st : regress problem** - Next by Date:
**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling** - Previous by thread:
**Re: st: specifying SVYSET in household survey using multi-stage clustered sampling** - Next by thread:
**SV: st: random number generator for gamma** - Index(es):