Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: DHS Ghana variable construction question


From   Friedrich Huebler <fhuebler@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: DHS Ghana variable construction question
Date   Wed, 29 Jul 2009 09:12:38 -0400

Tharshini,

Your excerpt from the data shows that you changed the sort order
before you created the variables mage and fage. Try this:

bysort hhid (hvidx): gen mage = hv105[hv112]
bysort hhid (hvidx): gen fage = hv105[hv114]

Friedrich

On Wed, Jul 29, 2009 at 8:21 AM, Tharshini
Thangavelu<thth4658@student.su.se> wrote:
> Hi,
>
>
> I have been working to figure out the problem to produce the correct mothers'and
> fathers' age. I came across the following advise on the statalist.
>
> http://www.stata.com/statalist/archive/2006-06/msg00323.html
>
> However, my dataset seem a bit more strange: The following variables are used to
> created mothers' and father's age. I still haven't produced the satisfactory
> results.
>
> hhid      hv104     hv105        hv112     hv114    mage    fage
>
> 1 1         2         4             2         1      10       4
> 1 1         1         10            2         1      10       4
> 1 1         1         42            .         .
> 1 1         2         36            .         .
> 1 1         2         2             2         1       10      4
> 1 2         1         28            .         .
> 1 4         1         33            .         .
> 1 5         2         24            .         .        .       .
> 1 6         1         12            0         0        .       .
>
> The 0's in hv112 and hv114 denotes mother not in HH, father not in HH respectively.
>
> Here are several things that I don't understand;
> 1.) the hhid is the case identification where hh denotes the household number
> and id the household member. I have only describes household no.1, there are 5
> id for the first member in the household no. 1, followed by member no.2 and
> member no. 4, 5 and 6. We don't have any id for = 3. Why is there in HHID 5
> values for household member 1.
>
> 2.) I use the follwoing command to produce mage and fage
>
> .by hhid : gen mage = hv105[hv112]
> .by hhid : gen fage = hv105[hv112]
>
> I created mothers' and fathers' age variable in household member data, directly
> after uploaded it into stata. Therefore I have not merged this dataset, which I
> will do in later state. I thought, creating parents' age in the original data
> would be more of advantage than doing it in the merged data set. Although,
> something tells me that it should produced the same results.
>
> You can see the result above table, that it cannot be a satisfactory results.
> Mothers age cannot be 10yrs or father 4yrs for the first hhid.
>
> I used the solution proposed at the above link, but the command assert did not
> work.  I don't understand what can be wrong!! If anyone have came across this
> problem working with microlevel data, any help would be valuable!!
>
> Thanks alot!
>
>
>
>
> ------>why are there 4 id for the first household member?
>
>
>
>
> On 2009-07-29, at 01:45, Friedrich Huebler wrote:
>> Tharshini,
>>
>> Please read the documentation for -merge- to understand how it works.
>> Do not -drop- anything after -merge- besides the _merge variable. You
>> have to keep all household members if you want to assign the parents'
>> ages and other characteristics to a child. How to do that was
>> explained in a previous post.
>>
>> http://www.stata.com/statalist/archive/2009-06/msg00793.html
>>
>> Friedrich
>>
>> On Tue, Jul 28, 2009 at 4:30 PM, Tharshini
>> Thangavelu<thth4658@student.su.se> wrote:
>>>
>>> Friedrich,
>>>
>>> 1.)The program "select" was suggested by DHS at the FAQ section.. When I wanted
>>> to upload the individual recode file, I couldn't because there was too many
>>> variables. As a results, I used this program. You can find more info on their
>>> website.
>>>
>>> 2.)I did as your suggestion. I uploaded the whole household member data, merged
>>> it with weight file and I did NOT use the command keep, only drop command to
>>> take away the _merge variable. Otherwise I cannot merge it with the individual
>>> file. I tried and it gave me an error message: _merge already defined.
>>>
>>> So I drop the _merge variable form the resulting file (uppsats.dta). Then, I
>>> write the following command:
>>>
>>> merge clnr hhnr lnr using ir
>>>
>>> variables clnr hhnr lnr do not uniquely identify observations in the master data
>>> caseid was str12 now str15
>>>
>>>
>>> tab _merge
>>>
>>>     _merge |      Freq.     Percent        Cum.
>>> ------------+-----------------------------------
>>>          1 |     23,199       78.11       78.11
>>>          2 |      3,100       10.44       88.55
>>>          3 |      3,402       11.45      100.00
>>> ------------+-----------------------------------
>>>      Total |     29,701      100.00
>>>
>>>
>>> Now comes a tricky part for me. Using the following commands, doesn't give me
>>> the desired results.
>>> keep if _merge==3
>>> drop _merge
>>>
>>> This file, just at in the former case when tabulating hv105 (= age of household
>>> member) gives exactly same answer, that is only children's age is included 0-5
>>> years.
>>>
>>> But if I don't use the command keep or drop. I have the age of ALL household
> member.
>>>
>>> My question is should I keep the "_merge" variable ? According to what I have
>>> been reading, I thought the functioning of merge is to only keep if _merge ==3.
>>>
>>> 3.) In your former email you say that : I drop all children without height and
>>> weight data and all adults, including parents. In my analysis, I use as
>>> dependent variable child health measured by age for height Z-score and weight
>>> for age Z-score. For those children having these Z-score, I need to match them
>>> with their respective parents education, age and households characteristics
>>> ect.to see if mothers' father's with higher education have children with better
>>> child health measured bye Z-score. Therefore, shouldn't the way I was doing be
>>> correct? Or I have misunderstood completely.
>>>
>>>
>>> Thanks
>>> Tharshini
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 2009-07-28, at 15:43, Friedrich Huebler wrote:
>>>> Tharshini,
>>>>
>>>> In step 3 you -drop- all children without height and weight data and
>>>> all adults, including all parents.
>>>>
>>>> You write "The household member data includes to many variables to
>>>> directly upload in stata." The flat household member recode file from
>>>> the Ghana DHS 2003 has 245 variables. The only version of Stata that
>>>> cannot hold 245 variables is Small Stata. Your -tab- output indicates
>>>> that you do not have Small Stata because you were able to work with
>>>> more than 26000 observations (see -help limits-). You should therefore
>>>> be able to open the complete household member file with Stata. I don't
>>>> know a program called "select" but it does not seem to be necessary.
>>>>
>>>> Friedrich
>>>>
>>>> On Tue, Jul 28, 2009 at 2:54 AM, Tharshini
>>>> Thangavelu<thth4658@student.su.se> wrote:
>>>>> Hi Friedrich,
>>>>>
>>>>> When I downloaded the dataset for Ghana 2003, there was a doc.file in the file
>>>>> for height and weight. A describtion of how to processed when merging and which
>>>>> identifying variables to chose in each and every file. I followed this doc.fil
>>>>> I merged the file according to the following way;
>>>>>
>>>>> 1.) The height and weight file for children up to 5 years old.
>>>>> rename HWHHID caseid
>>>>> rename HWLINE linenr
>>>>> sort caseid linenr
>>>>> save weight, replace
>>>>> clear exit
>>>>>
>>>>> 2.) The household member data includes to many variables to directly upload in
>>>>> stata, so I used the program "select", where I selected my variables of
>>>>> interest. Then I uploaded in stata;
>>>>>
>>>>> use hmr1
>>>>> rename hhid caseid
>>>>> rename hvidx linenr
>>>>> sort caseid linenr
>>>>> save hmr1, replace
>>>>>
>>>>> 3.) These two files was then merged together (master data = hmr1)
>>>>>
>>>>> merge caseid linenr using weight
>>>>>
>>>>> tab _merge
>>>>>
>>>>>     _merge |      Freq.     Percent        Cum.
>>>>> ------------+-----------------------------------
>>>>>          1 |     22,673       85.23       85.23
>>>>>          3 |      3,928       14.77      100.00
>>>>> ------------+-----------------------------------
>>>>>      Total |     26,601      100.00
>>>>>
>>>>> . keep if _merge ==3
>>>>> (22673 observations deleted)
>>>>>
>>>>> . drop _merge
>>>>>
>>>>> Error message : linenr was byte now int
>>>>>
>>>>> My own conclusion: Since _merge 3 = 3928 observations which is exactly same
>>>>> amount of obs. as in the weight file. I concluded the merging was correctly
>>>>> made. I also tried with the inverse case, i.e. having hmr as my master data.
>>>>>
>>>>> 4.) With this resulting file, I merged it with the individual recode file
>>>>> (=womens file). Cluster number (clnrhv001), householdnr (hhnr hv002) and
>>>>> mothers' line nr (lnr hc60)
>>>>>
>>>>> In the resulting file, I again renamed the identifying variables
>>>>> rename HV001 clnr
>>>>> rename HV002 hhnr
>>>>> rename hc60  lnr
>>>>> sort clnr hhnr lnr
>>>>> save thesis
>>>>> clear exit
>>>>>
>>>>> 5.)In the individual recode file, just as in the household member recode
> file, I
>>>>> used the program "select" to chose the variables and the following identifying
>>>>> variables were renamed. Cluster number (clnr v001), Household number (hhnr
> v002)
>>>>> and Respondent's line number (lnr v003).
>>>>>
>>>>> use ir1
>>>>> rename V001 clnr
>>>>> rename V002 hhnr
>>>>> rename V003 lnr
>>>>> sort clnr hhnr lnr
>>>>> save ir1, replace
>>>>>
>>>>> 6.)Now, I merge the ir1.dta with the thesis.dta
>>>>>
>>>>> merge clnr hhnr lnr using thesis
>>>>> tab _merge
>>>>>
>>>>>     _merge |      Freq.     Percent        Cum.
>>>>> ------------+-----------------------------------
>>>>>          1 |        526        7.48        7.48
>>>>>          2 |      3,100       44.11       51.59
>>>>>          3 |      3,402       48.41      100.00
>>>>> ------------+-----------------------------------
>>>>>      Total |      7,028      100.00
>>>>>
>>>>> . keep if _merge == 3
>>>>> (3626 observations deleted)
>>>>>
>>>>> . drop _merge
>>>>>
>>>>> Error message: variables clnr hhnr lnr do not uniquely identify observations in
>>>>> the master data. I hope this will help to solve the problem.
>>>>>
>>>>> / Tharshini
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2009-07-28, at 06:28, Friedrich Huebler wrote:
>>>>>> Tharshini,
>>>>>>
>>>>>> On June 11 you wrote that you wanted to merge the household member
>>>>>> file with the height and weight file. In response to your message you
>>>>>> received advice on how you can merge the data. The table in your
>>>>>> message of today makes clear that you did not merge the files
>>>>>> correctly because you only have persons up to 5 years of age. If you
>>>>>> want more help with this and the other problems you described you have
>>>>>> to show us your code, as explained in the Statalist FAQ.
>>>>>>
>>>>>> http://www.stata.com/support/faqs/res/statalist.html#advice
>>>>>>
>>>>>> Friedrich
>>>>>>
>>>>>> On Mon, Jul 27, 2009 at 9:42 AM, Tharshini
>>>>>> Thangavelu<thth4658@student.su.se> wrote:
>>>>>>>
>>>>>>> .tab hv105
>>>>>>>   Age of |
>>>>>>>  household |
>>>>>>>    members |      Freq.     Percent        Cum.
>>>>>>> ------------+-----------------------------------
>>>>>>>          0 |        772       22.69       22.69
>>>>>>>          1 |        706       20.75       43.45
>>>>>>>          2 |        655       19.25       62.70
>>>>>>>          3 |        689       20.25       82.95
>>>>>>>          4 |        553       16.26       99.21
>>>>>>>          5 |         27        0.79      100.00
>>>>>>> ------------+-----------------------------------
>>>>>>>      Total |      3,402      100.00

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index