Some health centers were sampled for both years as some blocks do overlap. and the program reports describe it as stratified two-stage sampling, here is the description: "stage 1 - block as the first geographical stratum and area covered by the health center (or health center) as the primary sampling unit stage 2 - all eligible respondents within the P.S.U, would be secondary sampling units - selected by proportionate random sampling from the P.S.U. These respondents are randomly selected from two separate lists of children's age groups (obtained from house-listing exercises). The respondents are mothers of children 0-5 months and 6-23 months of age." (as such I have 4 data sets in total - for each age group and for each year, '09 and '11) So, after appending I would have these variables: wt1 = pw (for 2009); wt2 = pw (for 2011) and similarly psu1, psu2, ssu1, ssu2, strata1, strata2, superstrata1, superstrata2 Just wanted to know the stata syntax for the svyset command after appending the two data sets, for a particular age group?.... thank you for your time :) Ameya On Thu, Nov 1, 2012 at 11:01 AM, Steve Samuels <sjsamuels@gmail.com> wrote: > Ameya, For an SRS design you don't need to get the population N in each > stratum, just the number of centers in each stratum and the number of > eligible respondents in each sampled center. The data will contain, > obviously, the number of selected centers and selected respondents in > each. > > You have a potential bias problem if the design was SRS and population > "sizes" of the health centers were skewed, e.g. there were relative few > "large" centers and more "small" ones. In such a case, respondents from > smaller centers may be over-represented.. The only simple fix is > post-stratification by center "size". Additionally, consider adding > center "size" to the regression models (see example below.) > >> appending adds observations and I want to compare >> trends across both years), how do I do that? > > If you wish to compare means or proportions > (let csize be a grouping of center sizes) > *********************** > svy: mean myvar, over(year) > xi: svy: reg myvar i.year > > svy: mean myvar over(year csize) > xi: svy: reg myvar i.year i.csize i.year*i.csize > ********************** > > For some sampling references, see: > http://www.stata.com/statalist/archive/2012-09/msg01058.html. > > > Steve > > On Oct 31, 2012, at 6:41 PM, Stas Kolenikov wrote: > > On 1, 2, 3, the short answers are "yes", "yes" and "yes". The longer > answers depend on what you have at hand. If you had a simple random > sample at each stage, then you simply muliply through the ratios (# of > units sampled)/(# of units in the population) to get the probability > of selection. A smarter survey statistician would design a PPS survey, > in which hospitals would be selected with probabilities proportional > to the measure of size (# of beds, # of hospitalized, etc.). You > obviously have to make the names of your survey design variables the > same in two data sets. > > A short answer to 4 is to -generate int year=2009- in one data set and > -year=2011- in the other before appending. I am not sure as to what's > the best way to approach 5, as it really depends on the computing > capacity you may have at hand. 800 variables and 10,000 observations > would produce at most 64Mb data set, and one would really have to go > back to the hardward from late 1990s to have problems with a data set > of this size. > > " > > > -- > -- Stas Kolenikov, PhD, PStat (SSC) :: http://stas.kolenikov.name > -- Senior Survey Statistician, Abt SRBI :: work email kolenikovs at > srbi dot com > -- Opinions stated in this email are mine only, and do not reflect the > position of my employer > > On Wed, Oct 31, 2012 at 5:13 PM, Ameya Bondre > <ameyabondre.jhsph@gmail.com> wrote: >> My name is Ameya Bondre and I am working on two survey data sets for a >> sustainability study, and had few questions. >> >> The study design: >> >> To give you a background - I have to compare a range of conditions >> (health behaviors, diseases and health services) in a region, at the >> end of a health program (year 2009 - endline survey), with similar >> conditions two years after the program stopped (year 2011 - evaluation >> survey, to measure sustainability of program activities). I have two >> data sets for the two cross-sectional surveys conducted in 2009 and >> 2011. The surveys are independent (as in, the sampling was done again >> in 2011). The populations surveyed each time, are different >> cross-sections of the same region. Both surveys involve the same >> sampling technique with "block" as the stratum, "health center" as the >> primary sampling unit and "respondents/mothers" as the secondary >> sampling unit (but the variable names for these design variables are >> different in 2009 and 2011 data sets). I am using STATA 10. No FPC >> correction has been applied as per the program reports. >> >> Questions (sampling weights and svy command): >> >> 1) I have probability weights already given in the 2009 data sets but >> I don't have those built in, for the 2011 data sets. I have been told >> that the entire sampling method was similar for both years. Am I >> understanding correctly that I first need to calculate weights for all >> observations for 2011, then append data sets, and then set up the >> combined data set as a "survey set"? >> >> 2) Further, do I need to create the sampling weight variable by >> calculating probability weights for 2011 observations (which I already >> have for 2009) ? if yes, what's the method to get weights - would I >> require the region's population (N) in 2011? >> >> 3) Do I need to create new design variables for the svyset command, >> after appending the two data sets? (like one variable for psu, strata, >> weight - taking both data sets into account) >> >> Questions (appending data sets) >> >> 4) In appending, I am not able to label the variables/observations for >> 2011 separately from 2009, to identify them as "2009" and "2011" >> variables (as appending adds observations and I want to compare >> trends across both years), how do I do that? >> >> 4) Since I am using STATA 10 with limited memory and my data sets are >> huge (800 odd variables and sample sizes in thousands); can I append >> few variables at a time (that I need to analyze, for certain >> regressions), instead of the entire data set - would that affect the >> survey design of the new combined data set, after appending? >> >> >> >> Please do let me know if any question is not clear. Thanks for your time..

Best,
Ameya Bondre

