Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: DHS Ghana variable construction question


From   Tharshini Thangavelu <thth4658@student.su.se>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: DHS Ghana variable construction question
Date   Wed, 29 Jul 2009 14:21:44 +0200 (CEST)

Hi,


I have been working to figure out the problem to produce the correct mothers'and
fathers' age. I came across the following advise on the statalist.

http://www.stata.com/statalist/archive/2006-06/msg00323.html

However, my dataset seem a bit more strange: The following variables are used to
created mothers' and father's age. I still haven't produced the satisfactory
results. 

hhid      hv104     hv105        hv112     hv114    mage    fage

1 1         2         4             2         1      10       4
1 1         1         10            2         1      10       4
1 1         1         42            .         .
1 1         2         36            .         .
1 1         2         2             2         1       10      4
1 2         1         28            .         .
1 4         1         33            .         .
1 5         2         24            .         .        .       .
1 6         1         12            0         0        .       .

The 0's in hv112 and hv114 denotes mother not in HH, father not in HH respectively.

Here are several things that I don't understand; 
1.) the hhid is the case identification where hh denotes the household number
and id the household member. I have only describes household no.1, there are 5
id for the first member in the household no. 1, followed by member no.2 and
member no. 4, 5 and 6. We don't have any id for = 3. Why is there in HHID 5
values for household member 1.

2.) I use the follwoing command to produce mage and fage

.by hhid : gen mage = hv105[hv112]
.by hhid : gen fage = hv105[hv112]

I created mothers' and fathers' age variable in household member data, directly
after uploaded it into stata. Therefore I have not merged this dataset, which I
will do in later state. I thought, creating parents' age in the original data
would be more of advantage than doing it in the merged data set. Although,
something tells me that it should produced the same results. 

You can see the result above table, that it cannot be a satisfactory results. 
Mothers age cannot be 10yrs or father 4yrs for the first hhid. 

I used the solution proposed at the above link, but the command assert did not
work.  I don't understand what can be wrong!! If anyone have came across this
problem working with microlevel data, any help would be valuable!!

Thanks alot!




------>why are there 4 id for the first household member? 




On 2009-07-29, at 01:45, Friedrich Huebler wrote:
> Tharshini,
>
> Please read the documentation for -merge- to understand how it works.
> Do not -drop- anything after -merge- besides the _merge variable. You
> have to keep all household members if you want to assign the parents'
> ages and other characteristics to a child. How to do that was
> explained in a previous post.
>
> http://www.stata.com/statalist/archive/2009-06/msg00793.html
>
> Friedrich
>
> On Tue, Jul 28, 2009 at 4:30 PM, Tharshini
> Thangavelu<thth4658@student.su.se> wrote:
>>
>> Friedrich,
>>
>> 1.)The program "select" was suggested by DHS at the FAQ section.. When I wanted
>> to upload the individual recode file, I couldn't because there was too many
>> variables. As a results, I used this program. You can find more info on their
>> website.
>>
>> 2.)I did as your suggestion. I uploaded the whole household member data, merged
>> it with weight file and I did NOT use the command keep, only drop command to
>> take away the _merge variable. Otherwise I cannot merge it with the individual
>> file. I tried and it gave me an error message: _merge already defined.
>>
>> So I drop the _merge variable form the resulting file (uppsats.dta). Then, I
>> write the following command:
>>
>> merge clnr hhnr lnr using ir
>>
>> variables clnr hhnr lnr do not uniquely identify observations in the master data
>> caseid was str12 now str15
>>
>>
>> tab _merge
>>
>>     _merge |      Freq.     Percent        Cum.
>> ------------+-----------------------------------
>>          1 |     23,199       78.11       78.11
>>          2 |      3,100       10.44       88.55
>>          3 |      3,402       11.45      100.00
>> ------------+-----------------------------------
>>      Total |     29,701      100.00
>>
>>
>> Now comes a tricky part for me. Using the following commands, doesn't give me
>> the desired results.
>> keep if _merge==3
>> drop _merge
>>
>> This file, just at in the former case when tabulating hv105 (= age of household
>> member) gives exactly same answer, that is only children's age is included 0-5
>> years.
>>
>> But if I don't use the command keep or drop. I have the age of ALL household
member.
>>
>> My question is should I keep the "_merge" variable ? According to what I have
>> been reading, I thought the functioning of merge is to only keep if _merge ==3.
>>
>> 3.) In your former email you say that : I drop all children without height and
>> weight data and all adults, including parents. In my analysis, I use as
>> dependent variable child health measured by age for height Z-score and weight
>> for age Z-score. For those children having these Z-score, I need to match them
>> with their respective parents education, age and households characteristics
>> ect.to see if mothers' father's with higher education have children with better
>> child health measured bye Z-score. Therefore, shouldn't the way I was doing be
>> correct? Or I have misunderstood completely.
>>
>>
>> Thanks
>> Tharshini
>>
>>
>>
>>
>>
>>
>>
>> On 2009-07-28, at 15:43, Friedrich Huebler wrote:
>>> Tharshini,
>>>
>>> In step 3 you -drop- all children without height and weight data and
>>> all adults, including all parents.
>>>
>>> You write "The household member data includes to many variables to
>>> directly upload in stata." The flat household member recode file from
>>> the Ghana DHS 2003 has 245 variables. The only version of Stata that
>>> cannot hold 245 variables is Small Stata. Your -tab- output indicates
>>> that you do not have Small Stata because you were able to work with
>>> more than 26000 observations (see -help limits-). You should therefore
>>> be able to open the complete household member file with Stata. I don't
>>> know a program called "select" but it does not seem to be necessary.
>>>
>>> Friedrich
>>>
>>> On Tue, Jul 28, 2009 at 2:54 AM, Tharshini
>>> Thangavelu<thth4658@student.su.se> wrote:
>>>> Hi Friedrich,
>>>>
>>>> When I downloaded the dataset for Ghana 2003, there was a doc.file in the file
>>>> for height and weight. A describtion of how to processed when merging and which
>>>> identifying variables to chose in each and every file. I followed this doc.fil
>>>> I merged the file according to the following way;
>>>>
>>>> 1.) The height and weight file for children up to 5 years old.
>>>> rename HWHHID caseid
>>>> rename HWLINE linenr
>>>> sort caseid linenr
>>>> save weight, replace
>>>> clear exit
>>>>
>>>> 2.) The household member data includes to many variables to directly upload in
>>>> stata, so I used the program "select", where I selected my variables of
>>>> interest. Then I uploaded in stata;
>>>>
>>>> use hmr1
>>>> rename hhid caseid
>>>> rename hvidx linenr
>>>> sort caseid linenr
>>>> save hmr1, replace
>>>>
>>>> 3.) These two files was then merged together (master data = hmr1)
>>>>
>>>> merge caseid linenr using weight
>>>>
>>>> tab _merge
>>>>
>>>>     _merge |      Freq.     Percent        Cum.
>>>> ------------+-----------------------------------
>>>>          1 |     22,673       85.23       85.23
>>>>          3 |      3,928       14.77      100.00
>>>> ------------+-----------------------------------
>>>>      Total |     26,601      100.00
>>>>
>>>> . keep if _merge ==3
>>>> (22673 observations deleted)
>>>>
>>>> . drop _merge
>>>>
>>>> Error message : linenr was byte now int
>>>>
>>>> My own conclusion: Since _merge 3 = 3928 observations which is exactly same
>>>> amount of obs. as in the weight file. I concluded the merging was correctly
>>>> made. I also tried with the inverse case, i.e. having hmr as my master data.
>>>>
>>>> 4.) With this resulting file, I merged it with the individual recode file
>>>> (=womens file). Cluster number (clnrhv001), householdnr (hhnr hv002) and
>>>> mothers' line nr (lnr hc60)
>>>>
>>>> In the resulting file, I again renamed the identifying variables
>>>> rename HV001 clnr
>>>> rename HV002 hhnr
>>>> rename hc60  lnr
>>>> sort clnr hhnr lnr
>>>> save thesis
>>>> clear exit
>>>>
>>>> 5.)In the individual recode file, just as in the household member recode
file, I
>>>> used the program "select" to chose the variables and the following identifying
>>>> variables were renamed. Cluster number (clnr v001), Household number (hhnr
v002)
>>>> and Respondent's line number (lnr v003).
>>>>
>>>> use ir1
>>>> rename V001 clnr
>>>> rename V002 hhnr
>>>> rename V003 lnr
>>>> sort clnr hhnr lnr
>>>> save ir1, replace
>>>>
>>>> 6.)Now, I merge the ir1.dta with the thesis.dta
>>>>
>>>> merge clnr hhnr lnr using thesis
>>>> tab _merge
>>>>
>>>>     _merge |      Freq.     Percent        Cum.
>>>> ------------+-----------------------------------
>>>>          1 |        526        7.48        7.48
>>>>          2 |      3,100       44.11       51.59
>>>>          3 |      3,402       48.41      100.00
>>>> ------------+-----------------------------------
>>>>      Total |      7,028      100.00
>>>>
>>>> . keep if _merge == 3
>>>> (3626 observations deleted)
>>>>
>>>> . drop _merge
>>>>
>>>> Error message: variables clnr hhnr lnr do not uniquely identify observations in
>>>> the master data. I hope this will help to solve the problem.
>>>>
>>>> / Tharshini
>>>>
>>>>
>>>>
>>>>
>>>> On 2009-07-28, at 06:28, Friedrich Huebler wrote:
>>>>> Tharshini,
>>>>>
>>>>> On June 11 you wrote that you wanted to merge the household member
>>>>> file with the height and weight file. In response to your message you
>>>>> received advice on how you can merge the data. The table in your
>>>>> message of today makes clear that you did not merge the files
>>>>> correctly because you only have persons up to 5 years of age. If you
>>>>> want more help with this and the other problems you described you have
>>>>> to show us your code, as explained in the Statalist FAQ.
>>>>>
>>>>> http://www.stata.com/support/faqs/res/statalist.html#advice
>>>>>
>>>>> Friedrich
>>>>>
>>>>> On Mon, Jul 27, 2009 at 9:42 AM, Tharshini
>>>>> Thangavelu<thth4658@student.su.se> wrote:
>>>>>>
>>>>>> .tab hv105
>>>>>>   Age of |
>>>>>>  household |
>>>>>>    members |      Freq.     Percent        Cum.
>>>>>> ------------+-----------------------------------
>>>>>>          0 |        772       22.69       22.69
>>>>>>          1 |        706       20.75       43.45
>>>>>>          2 |        655       19.25       62.70
>>>>>>          3 |        689       20.25       82.95
>>>>>>          4 |        553       16.26       99.21
>>>>>>          5 |         27        0.79      100.00
>>>>>> ------------+-----------------------------------
>>>>>>      Total |      3,402      100.00
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

-- 
Tharshini THANGAVELU
Forskarbacken 8 / 101
114 16 Stockholm
Sweden
Phone +46 (0)735 53 43 90
E-mail thth4658@student.su.se

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index