Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: DHS Womens Data Survey Setup

 From Steven Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: DHS Womens Data Survey Setup Date Sun, 17 Jul 2011 14:48:43 -0400

```> On Jul 16, 2011, at 1:23 AM, melissa daniels wrote:
>
>
> I am working on an analysis of DHS women's data (Ghana, 2008) using
> STATA 11.2. My sample includes only women with infants in the 0-23 month age
> range. DHS data are collected as a two-stage stratified sample of households.
>
> I want to identify all necessary survey vars I may need and use proper
> dataset construction for a survey analysis. I am still constructing the
> dataset, but am planning to use the following variables (as defined in
> DHS recode 5)
> and survey set statement.
>
> gen psu = v021 *this variable indicates enumeration areas for the survey.
> gen strata1 = v022 *this variable defines pairing or groupings of primary
> sampling units using in taylor series expansion
> gen strata2=v023 *this variable indicates the sample domain, or the basic
> geographic units wherein the sample was self-weighted.
> gen m_weight=v005/10^6  *(decimal correction as directed by DHS) this
> variable includes probability weights for the sample.
>
> svyset: psu (pweight=m_weight), strata(strata1)
>
> I have a couple questions:
>
> 1) I understand variance estimation is based on the taylor series expansion
> method, so I assume v022 (strata1 above) is the strata var
> I am most interested in. In what cases would the sample domain var v023 be
> of use to me? Is it important for survey estimation?
>
> 2) I believe I need data on the full sample of women in order to estimate
> corrected variances on the subset of women I am interested in. Does
> that mean I need to create
> my dataset with all women, or all individuals in the larger dataset?
> Or is my dataset complete since the
> subsample should be evenly dispersed throughout regions?
> If I need a larger dataset, do I just use a variable to flag women with
> children of the correct age for my subsample then and restrict all estimation
> commands to the subsample using an if statement?
>
> 3) I am interested in looking at biomarkers on a separate subsample who
> consented to a blood draw. However, there are no weights that I can
> locate for this subsample.
> Do I use the same weights as above, or do I need to
> create some sort of weight using the rate of consent?
>
> 4) I haven't been able to find any variables related to finite population
> control, likely because the sampling fraction is small
> for DHS. According to my understanding, FPC is not a concern for this
> analysis - please correct me if I'm wrong.
>
>

1. v023  is a regional or region/urban-rural stratum variable in which the sampling strata v022 are nested.  It can be very useful for studying geographic variation. You are quoting a general definition of v023 from the DHS recode manual, but you are responsible for knowing what v023 is in _your_ survey

2. You need information about all members of the data set to get valid standard errors for the subpopulation. The easiest way is to svyset the entire data set and apply the "subpop()" option to every analysis command. To get a smaller working data set, follow Austin Nichols's example at: http://www.stata.com/statalist/archive/2007-11/msg00810.html

3. The authors of this publication about the 2003 Ghana survey http://www.measuredhs.com/pubs/pdf/FR152/FR152.pdf
didn't weight for non-response infor the biomarker sample. I haven't seen the documentation for the 2008 sample, but I assume that's still the case, since you found no such weight. I didn't find their 2003 argument against weighting persuasive (for one thing, they didn't check response rates by the region variable.) The only way to be sure of avoiding non-response bias is to do the non-response weighting. This can increase standard errors while it decreases bias. To help decide what to do, see http://www.stata.com/statalist/archive/2011-06/msg00445.html,

4. Generally, fpc's are not used in DHS analyses.

Some advice: An experienced researcher once told me that there could be undocumented issues with DHS data. (I personally know this to be true, for example, for estimating maternal mortality rates.) Before proceeding, I suggest that you contact published researchers who have used your data set or one similar.  The DHS FaceBook page also looks like a promising place to ask questions.

Steve

Steven J. Samuels
Consultant in Statistics
18 Cantine's Island
Saugerties, NY 12477 USA
Voice: 845-246-0774
Fax:   206-202-4783
sjsamuels@gmail.com

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```