[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: DHS Ghana variable construction question

From   Friedrich Huebler <>
Subject   Re: st: DHS Ghana variable construction question
Date   Sun, 26 Jul 2009 08:11:31 -0400


Answer to question 1: The ages of the mother and father are given with
these commands:

by hhid: gen mage = hv105[hv112]
by hhid: gen fage = hv105[hv114]

The wrong parents' ages may be a consequence of missing observations
in a household. See these posts from the Statalist archive for a
possible solution:

Answer to question 2: The best you can do is use the wealth index as
an indicator of relative household wealth. For more information read
this article:

Deon Filmer and Lant H Pritchett, “Estimating wealth effects without
expenditure data - or Tears: An application to educational enrollments
in states of India,” Demography 38, no. 1 (February 2001): 115-132.

Answer to question 3: As an example, assume you want to consider only
children that have the same mother and father.

* Create unique ID for all groups of siblings
egen sid = group(hhid hv112 hv114)
* Count number of siblings of children under 5
bysort sid: egen sno = count(sid)
replace sno = . if hv105>=5
replace sno = sno - 1

Some children cannot be identified as siblings from the mother's and
father's line number, among them children whose parents are dead, do
not live in the same household or for whom the parents' line numbers
are missing. To exclude these children modify the code above:

egen sid = group(hhid hv112 hv114) if hv112>0 & hv112<99 & hv114>0 & hv114<99

With the commands above, children who do not share both parents but
only have the same mother or father cannot be identified as siblings.
For further reading I recommend these Stata FAQs:


On Sat, Jul 25, 2009 at 9:47 AM, Tharshini
Thangavelu<> wrote:
> Hi,
> I have few things that I wonder in the DHS dataset.
> Question 1.
> Responding first to Friedrich Hueblers answer 2009-06-22.
> I tried as you said inorder to get the age of mother and age of father. But
> those variables seems weird to me. Some of the values indicate 2, which in
> impossible. None of mothers or fathers' age can actually be 2 years. There are
> variables indicating partners age (v730)and mothers' age (v447a). I just don't
> understand why the two variables created according to the following command:
> by hhid: gen mage = hv105[hv112]
> by hhid: gen fage = hv105[hv114]
> doesn't give the same value in years as in the v730 and v447a. Normally this
> should be the case, or is it?
> Question 2. - Disposable household income variable
> I would like to create a variable for disposable household income. This variable
> doesn't exit in the DHS datasample but I would like to use a proxy variable that
> is available in the data. The suggested proxy variables are;
> .wealth index hv270 (indicates 1-5 level, where 1 is the poorest and 5 is richest.
> .respondens'currently working v714 (the respondent consist of only women which
> then will not indicate a good proxy for household income)
> Other potentially proxy variables are:
> . partners occupation v704 v705
> . respondents occupation  v716 v717
> I don't know which variable that can in an efficient and in a consistent way
> show the disposable household income variable.
> question 3. - Nr of siblings
> I would like to create, if possible nr of siblings for children under 5 (my
> dependent variable)
> How can I create this variable. I have looked if there is variable for nr of
> siblings. However, looking at the data sample closely the variables
> . Nr of household member h009
> . Nr of children under five years h014
> My reasoing for creation of nr of siblings is the following: looking closely at
> these two variables shows the following:
> Ex: in row nr5. the h009 denotes 5 and h014 denotes 2. Thus, this particular
> household, incorporates 5 household members with 2 children under five. The one
> member that is left, who is this person? Is it sibling as I am assuming or
> another family member such as a relative. Further more, I don't know if both
> parents are alive in the household. In order to check if both parents are alive
> I take on this method.
> sort hhid hvidx
> by hhid: gen mother = hv010[hv112]
> by hhid: gen father = hv011[hv114]
> The hv010 and hv011 represents nr of eligible women and nr of eligible men in
> the household. The hv112 and hv114 denotes, mother and fathers line nr
> respectively. Nevertheless, there are two other variables sh11 and sh13 which
> also indicates mother and fathers line nr. Does it matter which one I use?
> Somehow, it doesn't give me the desired results. Instead I try to combine with
> the variables
> . mothers' alive sh10
> . fathers' alive sh12
> This I just check by edit command. In the end how can I verfiy if the one member
> is actually a sibling? Because this is the variable that I am looking for.
> So, if someone can enlighten me in these three question. I would be happy.
> Best regards
> Tharshini

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index