Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: DHS Womens Data Survey Setup

From   melissa daniels <>
Subject   st: DHS Womens Data Survey Setup
Date   Sat, 16 Jul 2011 00:23:09 -0500

Hello fellow stata-users,

I am working on an analysis of DHS women's data (Ghana, 2008) using
STATA 11.2. My sample includes only women with infants in the 0-23 month age
range. DHS data are collected as a two-stage stratified sample of households.

I want to identify all necessary survey vars I may need and use proper
dataset construction for a survey analysis. I am still constructing the
dataset, but am planning to use the following variables (as defined in
DHS recode 5)
and survey set statement.

gen psu = v021 *this variable indicates enumeration areas for the survey.
gen strata1 = v022 *this variable defines pairing or groupings of primary
sampling units using in taylor series expansion
gen strata2=v023 *this variable indicates the sample domain, or the basic
geographic units wherein the sample was self-weighted.
gen m_weight=v005/10^6  *(decimal correction as directed by DHS) this
variable includes probability weights for the sample.

svyset: psu (pweight=m_weight), strata(strata1)

I have a couple questions:

1) I understand variance estimation is based on the taylor series expansion
method, so I assume v022 (strata1 above) is the strata var
I am most interested in. In what cases would the sample domain var v023 be
of use to me? Is it important for survey estimation?

2) I believe I need data on the full sample of women in order to estimate
corrected variances on the subset of women I am interested in. Does
that mean I need to create
my dataset with all women, or all individuals in the larger dataset?
Or is my dataset complete since the
subsample should be evenly dispersed throughout regions?
If I need a larger dataset, do I just use a variable to flag women with
children of the correct age for my subsample then and restrict all estimation
commands to the subsample using an if statement?

3) I am interested in looking at biomarkers on a separate subsample who
consented to a blood draw. However, there are no weights that I can
locate for this subsample.
Do I use the same weights as above, or do I need to
create some sort of weight using the rate of consent?

4) I haven't been able to find any variables related to finite population
control, likely because the sampling fraction is small
for DHS. According to my understanding, FPC is not a concern for this
analysis - please correct me if I'm wrong.

Thank you sincerely,
Melissa Daniels
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index