Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: DHS Ghana variable construction question


From   Tharshini Thangavelu <thth4658@student.su.se>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: DHS Ghana variable construction question
Date   Mon, 27 Jul 2009 15:42:38 +0200 (CEST)

Hi,

Beginning with responding to Friedrich 

1.) The suggested command seems not be working for getting the age of mother and
father. I investigated further on this problem and found that the following notice:

.tab hv105
   Age of |
  household |
    members |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        772       22.69       22.69
          1 |        706       20.75       43.45
          2 |        655       19.25       62.70
          3 |        689       20.25       82.95
          4 |        553       16.26       99.21
          5 |         27        0.79      100.00
------------+-----------------------------------
      Total |      3,402      100.00

As you can see from the above table, the age of household member contains an
interval of 1-5, this explains why I cannot get the mother and fathers age
correctly. Further on, it seem like if the variables hv105 and hc1 which denotes
age in months corresponds as does the two following variables; hc27 - sex and
hv104 - sex of household member. 

Does this means that my merged dataset is not reliable? Where is the problem? I
believe that I have followed the merging document that is available when
downloading datasets from DHS.

Can I ignore this problem and simply use the variables for mother and fathers
age from the following variables; v730 - partners age and v447a - womens' age in
years from household report.

2.) Thanks for the suggested articles, I will definitely
 take a closer look at them.

3.) Creating nr of siblings produced the following results. 
tab sno

        sno |      Freq.     Percent        Cum.
------------+-----------------------------------
         -1 |        966       28.62       28.62
          0 |      1,198       35.50       64.12
          1 |      1,112       32.95       97.07
          2 |         91        2.70       99.76
          3 |          8        0.24      100.00
------------+-----------------------------------
      Total |      3,375      100.00


I wanted to double check if this is reasonable. My intuition said that if ; v218
- No. of living children = (sno + hv014 denotes no. of children under 5 years
old), it should be correct. But just by looking at the data editor, it was clear
that this was not the case. How do I know that the created variable sno, is correct?


*****************************
QUESTION PART
*****************************

Question 1.
I have a specific question concerning some variables existence in the DHS data
sample. Probably those who has been working extensively or familiar to this data
can answer. 

Is there in the Mens' Recode file : their heigh and weight? I have been looking
for theses two variables without any success? Since there is height and weight
for women I thought surely there must be for men. Having these two variables for
men as control variables in my analysis can be good, given that they exists. 

Question 2.
Since DHS is a survey, I know there is command svy which incorporates some of
the survey characteristic. I read about the command, apparently there are under
certain circumstances should be avoided to use. I am using a simple OLS and 2SLS
(IV method). I know that applying this command yield a consistent std. error,
p-value and confidence intervall. Though, my issu concern if the difference is
notable that I should use the command? Is the command applicable when using 2SLS
with IV method?  

Tharshini





On 2009-07-26, at 14:11, Friedrich Huebler wrote:
> Tharshini,
>
> Answer to question 1: The ages of the mother and father are given with
> these commands:
>
> by hhid: gen mage = hv105[hv112]
> by hhid: gen fage = hv105[hv114]
>
> The wrong parents' ages may be a consequence of missing observations
> in a household. See these posts from the Statalist archive for a
> possible solution:
>
> http://www.stata.com/statalist/archive/2006-06/msg00321.html
> http://www.stata.com/statalist/archive/2006-06/msg00323.html
>
> Answer to question 2: The best you can do is use the wealth index as
> an indicator of relative household wealth. For more information read
> this article:
>
> Deon Filmer and Lant H Pritchett, “Estimating wealth effects without
> expenditure data - or Tears: An application to educational enrollments
> in states of India,” Demography 38, no. 1 (February 2001): 115-132.
>
> Answer to question 3: As an example, assume you want to consider only
> children that have the same mother and father.
>
> * Create unique ID for all groups of siblings
> egen sid = group(hhid hv112 hv114)
> * Count number of siblings of children under 5
> bysort sid: egen sno = count(sid)
> replace sno = . if hv105>=5
> replace sno = sno - 1
>
> Some children cannot be identified as siblings from the mother's and
> father's line number, among them children whose parents are dead, do
> not live in the same household or for whom the parents' line numbers
> are missing. To exclude these children modify the code above:
>
> egen sid = group(hhid hv112 hv114) if hv112>0 & hv112<99 & hv114>0 & hv114<99
>
> With the commands above, children who do not share both parents but
> only have the same mother or father cannot be identified as siblings.
> For further reading I recommend these Stata FAQs:
>
> http://www.stata.com/support/faqs/data/anyall.html
> http://www.stata.com/support/faqs/data/members.html
>
> Friedrich
>
> On Sat, Jul 25, 2009 at 9:47 AM, Tharshini
> Thangavelu<thth4658@student.su.se> wrote:
>> Hi,
>>
>> I have few things that I wonder in the DHS dataset.
>>
>> Question 1.
>> Responding first to Friedrich Hueblers answer 2009-06-22.
>> I tried as you said inorder to get the age of mother and age of father. But
>> those variables seems weird to me. Some of the values indicate 2, which in
>> impossible. None of mothers or fathers' age can actually be 2 years. There are
>> variables indicating partners age (v730)and mothers' age (v447a). I just don't
>> understand why the two variables created according to the following command:
>>
>> by hhid: gen mage = hv105[hv112]
>> by hhid: gen fage = hv105[hv114]
>>
>> doesn't give the same value in years as in the v730 and v447a. Normally this
>> should be the case, or is it?
>>
>> Question 2. - Disposable household income variable
>> I would like to create a variable for disposable household income. This variable
>> doesn't exit in the DHS datasample but I would like to use a proxy variable that
>> is available in the data. The suggested proxy variables are;
>>
>> .wealth index hv270 (indicates 1-5 level, where 1 is the poorest and 5 is
richest.
>> .respondens'currently working v714 (the respondent consist of only women which
>> then will not indicate a good proxy for household income)
>>
>> Other potentially proxy variables are:
>> . partners occupation v704 v705
>> . respondents occupation  v716 v717
>>
>> I don't know which variable that can in an efficient and in a consistent way
>> show the disposable household income variable.
>>
>> question 3. - Nr of siblings
>>
>> I would like to create, if possible nr of siblings for children under 5 (my
>> dependent variable)
>>
>> How can I create this variable. I have looked if there is variable for nr of
>> siblings. However, looking at the data sample closely the variables
>> . Nr of household member h009
>> . Nr of children under five years h014
>>
>> My reasoing for creation of nr of siblings is the following: looking closely at
>> these two variables shows the following:
>>
>> Ex: in row nr5. the h009 denotes 5 and h014 denotes 2. Thus, this particular
>> household, incorporates 5 household members with 2 children under five. The one
>> member that is left, who is this person? Is it sibling as I am assuming or
>> another family member such as a relative. Further more, I don't know if both
>> parents are alive in the household. In order to check if both parents are alive
>> I take on this method.
>>
>> sort hhid hvidx
>>
>> by hhid: gen mother = hv010[hv112]
>> by hhid: gen father = hv011[hv114]
>>
>> The hv010 and hv011 represents nr of eligible women and nr of eligible men in
>> the household. The hv112 and hv114 denotes, mother and fathers line nr
>> respectively. Nevertheless, there are two other variables sh11 and sh13 which
>> also indicates mother and fathers line nr. Does it matter which one I use?
>>
>> Somehow, it doesn't give me the desired results. Instead I try to combine with
>> the variables
>>
>> . mothers' alive sh10
>> . fathers' alive sh12
>>
>> This I just check by edit command. In the end how can I verfiy if the one member
>> is actually a sibling? Because this is the variable that I am looking for.
>>
>>
>> So, if someone can enlighten me in these three question. I would be happy.
>>
>> Best regards
>> Tharshini
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

-- 
Tharshini THANGAVELU
Forskarbacken 8 / 101
114 16 Stockholm
Sweden
Phone +46 (0)735 53 43 90
E-mail thth4658@student.su.se

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index