Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: DHS Ghana variable construction question


From   Friedrich Huebler <fhuebler@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: DHS Ghana variable construction question
Date   Tue, 28 Jul 2009 19:45:03 -0400

Tharshini,

Please read the documentation for -merge- to understand how it works.
Do not -drop- anything after -merge- besides the _merge variable. You
have to keep all household members if you want to assign the parents'
ages and other characteristics to a child. How to do that was
explained in a previous post.

http://www.stata.com/statalist/archive/2009-06/msg00793.html

Friedrich

On Tue, Jul 28, 2009 at 4:30 PM, Tharshini
Thangavelu<thth4658@student.su.se> wrote:
>
> Friedrich,
>
> 1.)The program "select" was suggested by DHS at the FAQ section.. When I wanted
> to upload the individual recode file, I couldn't because there was too many
> variables. As a results, I used this program. You can find more info on their
> website.
>
> 2.)I did as your suggestion. I uploaded the whole household member data, merged
> it with weight file and I did NOT use the command keep, only drop command to
> take away the _merge variable. Otherwise I cannot merge it with the individual
> file. I tried and it gave me an error message: _merge already defined.
>
> So I drop the _merge variable form the resulting file (uppsats.dta). Then, I
> write the following command:
>
> merge clnr hhnr lnr using ir
>
> variables clnr hhnr lnr do not uniquely identify observations in the master data
> caseid was str12 now str15
>
>
> tab _merge
>
>     _merge |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>          1 |     23,199       78.11       78.11
>          2 |      3,100       10.44       88.55
>          3 |      3,402       11.45      100.00
> ------------+-----------------------------------
>      Total |     29,701      100.00
>
>
> Now comes a tricky part for me. Using the following commands, doesn't give me
> the desired results.
> keep if _merge==3
> drop _merge
>
> This file, just at in the former case when tabulating hv105 (= age of household
> member) gives exactly same answer, that is only children's age is included 0-5
> years.
>
> But if I don't use the command keep or drop. I have the age of ALL household member.
>
> My question is should I keep the "_merge" variable ? According to what I have
> been reading, I thought the functioning of merge is to only keep if _merge ==3.
>
> 3.) In your former email you say that : I drop all children without height and
> weight data and all adults, including parents. In my analysis, I use as
> dependent variable child health measured by age for height Z-score and weight
> for age Z-score. For those children having these Z-score, I need to match them
> with their respective parents education, age and households characteristics
> ect.to see if mothers' father's with higher education have children with better
> child health measured bye Z-score. Therefore, shouldn't the way I was doing be
> correct? Or I have misunderstood completely.
>
>
> Thanks
> Tharshini
>
>
>
>
>
>
>
> On 2009-07-28, at 15:43, Friedrich Huebler wrote:
>> Tharshini,
>>
>> In step 3 you -drop- all children without height and weight data and
>> all adults, including all parents.
>>
>> You write "The household member data includes to many variables to
>> directly upload in stata." The flat household member recode file from
>> the Ghana DHS 2003 has 245 variables. The only version of Stata that
>> cannot hold 245 variables is Small Stata. Your -tab- output indicates
>> that you do not have Small Stata because you were able to work with
>> more than 26000 observations (see -help limits-). You should therefore
>> be able to open the complete household member file with Stata. I don't
>> know a program called "select" but it does not seem to be necessary.
>>
>> Friedrich
>>
>> On Tue, Jul 28, 2009 at 2:54 AM, Tharshini
>> Thangavelu<thth4658@student.su.se> wrote:
>>> Hi Friedrich,
>>>
>>> When I downloaded the dataset for Ghana 2003, there was a doc.file in the file
>>> for height and weight. A describtion of how to processed when merging and which
>>> identifying variables to chose in each and every file. I followed this doc.fil
>>> I merged the file according to the following way;
>>>
>>> 1.) The height and weight file for children up to 5 years old.
>>> rename HWHHID caseid
>>> rename HWLINE linenr
>>> sort caseid linenr
>>> save weight, replace
>>> clear exit
>>>
>>> 2.) The household member data includes to many variables to directly upload in
>>> stata, so I used the program "select", where I selected my variables of
>>> interest. Then I uploaded in stata;
>>>
>>> use hmr1
>>> rename hhid caseid
>>> rename hvidx linenr
>>> sort caseid linenr
>>> save hmr1, replace
>>>
>>> 3.) These two files was then merged together (master data = hmr1)
>>>
>>> merge caseid linenr using weight
>>>
>>> tab _merge
>>>
>>>     _merge |      Freq.     Percent        Cum.
>>> ------------+-----------------------------------
>>>          1 |     22,673       85.23       85.23
>>>          3 |      3,928       14.77      100.00
>>> ------------+-----------------------------------
>>>      Total |     26,601      100.00
>>>
>>> . keep if _merge ==3
>>> (22673 observations deleted)
>>>
>>> . drop _merge
>>>
>>> Error message : linenr was byte now int
>>>
>>> My own conclusion: Since _merge 3 = 3928 observations which is exactly same
>>> amount of obs. as in the weight file. I concluded the merging was correctly
>>> made. I also tried with the inverse case, i.e. having hmr as my master data.
>>>
>>> 4.) With this resulting file, I merged it with the individual recode file
>>> (=womens file). Cluster number (clnrhv001), householdnr (hhnr hv002) and
>>> mothers' line nr (lnr hc60)
>>>
>>> In the resulting file, I again renamed the identifying variables
>>> rename HV001 clnr
>>> rename HV002 hhnr
>>> rename hc60  lnr
>>> sort clnr hhnr lnr
>>> save thesis
>>> clear exit
>>>
>>> 5.)In the individual recode file, just as in the household member recode file, I
>>> used the program "select" to chose the variables and the following identifying
>>> variables were renamed. Cluster number (clnr v001), Household number (hhnr v002)
>>> and Respondent's line number (lnr v003).
>>>
>>> use ir1
>>> rename V001 clnr
>>> rename V002 hhnr
>>> rename V003 lnr
>>> sort clnr hhnr lnr
>>> save ir1, replace
>>>
>>> 6.)Now, I merge the ir1.dta with the thesis.dta
>>>
>>> merge clnr hhnr lnr using thesis
>>> tab _merge
>>>
>>>     _merge |      Freq.     Percent        Cum.
>>> ------------+-----------------------------------
>>>          1 |        526        7.48        7.48
>>>          2 |      3,100       44.11       51.59
>>>          3 |      3,402       48.41      100.00
>>> ------------+-----------------------------------
>>>      Total |      7,028      100.00
>>>
>>> . keep if _merge == 3
>>> (3626 observations deleted)
>>>
>>> . drop _merge
>>>
>>> Error message: variables clnr hhnr lnr do not uniquely identify observations in
>>> the master data. I hope this will help to solve the problem.
>>>
>>> / Tharshini
>>>
>>>
>>>
>>>
>>> On 2009-07-28, at 06:28, Friedrich Huebler wrote:
>>>> Tharshini,
>>>>
>>>> On June 11 you wrote that you wanted to merge the household member
>>>> file with the height and weight file. In response to your message you
>>>> received advice on how you can merge the data. The table in your
>>>> message of today makes clear that you did not merge the files
>>>> correctly because you only have persons up to 5 years of age. If you
>>>> want more help with this and the other problems you described you have
>>>> to show us your code, as explained in the Statalist FAQ.
>>>>
>>>> http://www.stata.com/support/faqs/res/statalist.html#advice
>>>>
>>>> Friedrich
>>>>
>>>> On Mon, Jul 27, 2009 at 9:42 AM, Tharshini
>>>> Thangavelu<thth4658@student.su.se> wrote:
>>>>>
>>>>> .tab hv105
>>>>>   Age of |
>>>>>  household |
>>>>>    members |      Freq.     Percent        Cum.
>>>>> ------------+-----------------------------------
>>>>>          0 |        772       22.69       22.69
>>>>>          1 |        706       20.75       43.45
>>>>>          2 |        655       19.25       62.70
>>>>>          3 |        689       20.25       82.95
>>>>>          4 |        553       16.26       99.21
>>>>>          5 |         27        0.79      100.00
>>>>> ------------+-----------------------------------
>>>>>      Total |      3,402      100.00

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index