Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: appending two survey data sets

From   Ameya Bondre <>
Subject   st: appending two survey data sets
Date   Wed, 31 Oct 2012 15:13:06 -0700

My name is Ameya Bondre and I am working on two survey data sets for a
sustainability study, and had few questions.

The study design:

To give you a background - I have to compare a range of conditions
(health behaviors, diseases and health services) in a region, at the
end of a health program (year 2009 - endline survey), with similar
conditions two years after the program stopped (year 2011 - evaluation
survey, to measure sustainability of program activities). I have two
data sets for the two cross-sectional surveys conducted in 2009 and
2011. The surveys are independent (as in, the sampling was done again
in 2011). The populations surveyed each time, are different
cross-sections of the same region. Both surveys involve the same
sampling technique with "block" as the stratum, "health center" as the
primary sampling unit and "respondents/mothers" as the secondary
sampling unit (but the variable names for these design variables are
different in 2009 and 2011 data sets). I am using STATA 10. No FPC
correction has been applied as per the program reports.

Questions (sampling weights and svy command):

1) I have probability weights already given in the 2009 data sets but
I don't have those built in, for the 2011 data sets. I have been told
that the entire sampling method was similar for both years. Am I
understanding correctly that I first need to calculate weights for all
observations for 2011, then append data sets, and then set up the
combined data set as a "survey set"?

2) Further, do I need to create the sampling weight variable by
calculating probability weights for 2011 observations (which I already
have for 2009) ? if yes, what's the method to get weights - would I
require the region's population (N) in 2011?

3) Do I need to create new design variables for the svyset command,
after appending the two data sets? (like one variable for psu, strata,
weight - taking both data sets into account)

Questions (appending data sets)

4) In appending, I am not able to label the variables/observations for
2011 separately from 2009, to identify them as "2009" and "2011"
variables  (as appending adds observations and I want to compare
trends across both years), how do I do that?

4) Since I am using STATA 10 with limited memory and my data sets are
huge (800 odd variables and sample sizes in thousands); can I append
few variables at a time (that I need to analyze, for certain
regressions), instead of the entire data set - would that affect the
survey design of the new combined data set, after appending?

Please do let me know if any question is not clear. Thanks for your time..

Ameya Bondre
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index