Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: DHS Ghana variable construction question


From   Tharshini Thangavelu <thth4658@student.su.se>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: DHS Ghana variable construction question
Date   Tue, 28 Jul 2009 22:30:53 +0200 (CEST)

Friedrich,

1.)The program "select" was suggested by DHS at the FAQ section.. When I wanted
to upload the individual recode file, I couldn't because there was too many
variables. As a results, I used this program. You can find more info on their
website.  

2.)I did as your suggestion. I uploaded the whole household member data, merged
it with weight file and I did NOT use the command keep, only drop command to
take away the _merge variable. Otherwise I cannot merge it with the individual
file. I tried and it gave me an error message: _merge already defined.

So I drop the _merge variable form the resulting file (uppsats.dta). Then, I
write the following command:

merge clnr hhnr lnr using ir

variables clnr hhnr lnr do not uniquely identify observations in the master data
caseid was str12 now str15


tab _merge

     _merge |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |     23,199       78.11       78.11
          2 |      3,100       10.44       88.55
          3 |      3,402       11.45      100.00
------------+-----------------------------------
      Total |     29,701      100.00


Now comes a tricky part for me. Using the following commands, doesn't give me
the desired results.
keep if _merge==3
drop _merge

This file, just at in the former case when tabulating hv105 (= age of household
member) gives exactly same answer, that is only children's age is included 0-5
years. 

But if I don't use the command keep or drop. I have the age of ALL household member.

My question is should I keep the "_merge" variable ? According to what I have
been reading, I thought the functioning of merge is to only keep if _merge ==3. 

3.) In your former email you say that : I drop all children without height and
weight data and all adults, including parents. In my analysis, I use as
dependent variable child health measured by age for height Z-score and weight
for age Z-score. For those children having these Z-score, I need to match them
with their respective parents education, age and households characteristics
ect.to see if mothers' father's with higher education have children with better
child health measured bye Z-score. Therefore, shouldn't the way I was doing be
correct? Or I have misunderstood completely.


Thanks
Tharshini







On 2009-07-28, at 15:43, Friedrich Huebler wrote:
> Tharshini,
>
> In step 3 you -drop- all children without height and weight data and
> all adults, including all parents.
>
> You write "The household member data includes to many variables to
> directly upload in stata." The flat household member recode file from
> the Ghana DHS 2003 has 245 variables. The only version of Stata that
> cannot hold 245 variables is Small Stata. Your -tab- output indicates
> that you do not have Small Stata because you were able to work with
> more than 26000 observations (see -help limits-). You should therefore
> be able to open the complete household member file with Stata. I don't
> know a program called "select" but it does not seem to be necessary.
>
> Friedrich
>
> On Tue, Jul 28, 2009 at 2:54 AM, Tharshini
> Thangavelu<thth4658@student.su.se> wrote:
>> Hi Friedrich,
>>
>> When I downloaded the dataset for Ghana 2003, there was a doc.file in the file
>> for height and weight. A describtion of how to processed when merging and which
>> identifying variables to chose in each and every file. I followed this doc.fil
>> I merged the file according to the following way;
>>
>> 1.) The height and weight file for children up to 5 years old.
>> rename HWHHID caseid
>> rename HWLINE linenr
>> sort caseid linenr
>> save weight, replace
>> clear exit
>>
>> 2.) The household member data includes to many variables to directly upload in
>> stata, so I used the program "select", where I selected my variables of
>> interest. Then I uploaded in stata;
>>
>> use hmr1
>> rename hhid caseid
>> rename hvidx linenr
>> sort caseid linenr
>> save hmr1, replace
>>
>> 3.) These two files was then merged together (master data = hmr1)
>>
>> merge caseid linenr using weight
>>
>> tab _merge
>>
>>     _merge |      Freq.     Percent        Cum.
>> ------------+-----------------------------------
>>          1 |     22,673       85.23       85.23
>>          3 |      3,928       14.77      100.00
>> ------------+-----------------------------------
>>      Total |     26,601      100.00
>>
>> . keep if _merge ==3
>> (22673 observations deleted)
>>
>> . drop _merge
>>
>> Error message : linenr was byte now int
>>
>> My own conclusion: Since _merge 3 = 3928 observations which is exactly same
>> amount of obs. as in the weight file. I concluded the merging was correctly
>> made. I also tried with the inverse case, i.e. having hmr as my master data.
>>
>> 4.) With this resulting file, I merged it with the individual recode file
>> (=womens file). Cluster number (clnrhv001), householdnr (hhnr hv002) and
>> mothers' line nr (lnr hc60)
>>
>> In the resulting file, I again renamed the identifying variables
>> rename HV001 clnr
>> rename HV002 hhnr
>> rename hc60  lnr
>> sort clnr hhnr lnr
>> save thesis
>> clear exit
>>
>> 5.)In the individual recode file, just as in the household member recode file, I
>> used the program "select" to chose the variables and the following identifying
>> variables were renamed. Cluster number (clnr v001), Household number (hhnr v002)
>> and Respondent's line number (lnr v003).
>>
>> use ir1
>> rename V001 clnr
>> rename V002 hhnr
>> rename V003 lnr
>> sort clnr hhnr lnr
>> save ir1, replace
>>
>> 6.)Now, I merge the ir1.dta with the thesis.dta
>>
>> merge clnr hhnr lnr using thesis
>> tab _merge
>>
>>     _merge |      Freq.     Percent        Cum.
>> ------------+-----------------------------------
>>          1 |        526        7.48        7.48
>>          2 |      3,100       44.11       51.59
>>          3 |      3,402       48.41      100.00
>> ------------+-----------------------------------
>>      Total |      7,028      100.00
>>
>> . keep if _merge == 3
>> (3626 observations deleted)
>>
>> . drop _merge
>>
>> Error message: variables clnr hhnr lnr do not uniquely identify observations in
>> the master data. I hope this will help to solve the problem.
>>
>> / Tharshini
>>
>>
>>
>>
>> On 2009-07-28, at 06:28, Friedrich Huebler wrote:
>>> Tharshini,
>>>
>>> On June 11 you wrote that you wanted to merge the household member
>>> file with the height and weight file. In response to your message you
>>> received advice on how you can merge the data. The table in your
>>> message of today makes clear that you did not merge the files
>>> correctly because you only have persons up to 5 years of age. If you
>>> want more help with this and the other problems you described you have
>>> to show us your code, as explained in the Statalist FAQ.
>>>
>>> http://www.stata.com/support/faqs/res/statalist.html#advice
>>>
>>> Friedrich
>>>
>>> On Mon, Jul 27, 2009 at 9:42 AM, Tharshini
>>> Thangavelu<thth4658@student.su.se> wrote:
>>>>
>>>> .tab hv105
>>>>   Age of |
>>>>  household |
>>>>    members |      Freq.     Percent        Cum.
>>>> ------------+-----------------------------------
>>>>          0 |        772       22.69       22.69
>>>>          1 |        706       20.75       43.45
>>>>          2 |        655       19.25       62.70
>>>>          3 |        689       20.25       82.95
>>>>          4 |        553       16.26       99.21
>>>>          5 |         27        0.79      100.00
>>>> ------------+-----------------------------------
>>>>      Total |      3,402      100.00
>>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

-- 
Tharshini THANGAVELU
Forskarbacken 8 / 101
114 16 Stockholm
Sweden
Phone +46 (0)735 53 43 90
E-mail thth4658@student.su.se

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index