Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: svyset question


From   Steven Samuels <[email protected]>
To   [email protected]
Subject   Re: st: svyset question
Date   Fri, 20 Apr 2007 12:17:22 -0400

Jeff, I recommend that you consult a sampling statistician. You have:

CODE PROBLEMS

1. With only one district selected within each stratum, STATA has no replicates with which to compute a SE. You have several choices: 1. Combine neighboring regions to get three strata. Be warned that the standard error multiplier for t-statistics will be: 3.18 (compared to 1.95 for a normal approximation). 2. If regions do not differ much- but how can you tell with only one obs per region?-then omit the stratum specification and get 6-1 5 degrees of freedom for the highest level of sampling. The t-multiplier in this case is 2.57, about 20% less. Personally I would go with 90% intervals, so that the t-multiplier is 2.35 for 3 strata.

2. The weight variable must be specified in a [pweight= ] statement before the comma.

CONCEPTUAL PROBLEMS

3. It looks like there are 3-4 stages of sampling: 1. District in strata. 2. village/ward within district. 3. hh within village/ward. 4. person within hh. Only the first 3 would be specified if only one person is selected from each hh.

4. It looks like there was a second level of stratification- urban vs rural- someplace in the design. Your description "PPS" sampling makes no sense unless this is true.

ANALYTIC ISSUES.
5. In a survey of this size, especially with no replication at the first stage, some post-stratification or sample raking would be standard practice.


Regards,

Steven

On Apr 20, 2007, at 10:24 AM, Jeff Edmeades wrote:


Hi all,

I am working with survey data with the following design (as described to
me): "Respondents were selected through stratified cluster sampling,
with one district randomly selected from six geographic regions. Ten
sampling units (villages in rural areas and urban wards in urban areas)
were then selected in each district through probability proportional to
size sampling, with purposeful oversampling of urban areas to ensure
sufficient cases for the analysis of rural-urban differences. A
household listing was conducted in each of the sampling units, from
which 40 eligible individuals were randomly selected."

My understanding of the correct syntax for this (following from the SVY
manual) is:

svyset district, strata(region) fpc(ndistricts) || samplingunit
[sampling weight for urban oversample] fpc(nsamplingunits)

Is this correct??

Many thanks,
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index