Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: DHS Ghana variable construction question


From   Tharshini Thangavelu <thth4658@student.su.se>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: DHS Ghana variable construction question
Date   Wed, 29 Jul 2009 18:15:58 +0200 (CEST)

Friedrich,

Thanks! I followed the new way, which actually gave the same results as one of
the previous case. I did it in the original dataset, ie. household member
report. It seems strange that mothers' age have min value of 5. When tabulating,
only one observation had value 5. I assumed that it is missing value and
replaced it.

sum mage fage

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        mage |      9411    35.21177    8.643875          5         76
        fage |      7265    44.92953    12.64342         19         99

________________________________________________________________________
The following output is to show the difference between in the two variables
which normally should be the same. I still have not figured out why this is not
the case.
 

. sum v730 fage

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        v730 |      4463    40.52028    11.91459         18         99
        fage |      7265    44.92953    12.64342         19         99

. sum v447a mage

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       v447a |      6502    29.45709    9.297519         15         49
        mage |      9409     35.2182    8.633561         15         76



Until now I have used the variables v730 partners age, I assumed this as fathers
age and mothers age as v447a (womens age in years from household report.) For
education I used hc62 and v702 respectively. The method that was introduced by
finding the mothers and fathers age by including hhid hvidx is new for me and
confusing. 

How do I now find mothers and fathers education level? 


Does this mean that I don't have to merge with the individual recode file once I
have merged with anthropometric and household member data?

I think I am getting rather confused about how to work with microlevel data. I
actually did some regression outputs but I was working with the dataset which
had 3402 observation. That is I had deleated _merge variable and kept == 3(both
using and master data.)

Tharshini


On 2009-07-29, at 15:12, Friedrich Huebler wrote:
> Tharshini,
>
> Your excerpt from the data shows that you changed the sort order
> before you created the variables mage and fage. Try this:
>
> bysort hhid (hvidx): gen mage = hv105[hv112]
> bysort hhid (hvidx): gen fage = hv105[hv114]
>
> Friedrich
>
> On Wed, Jul 29, 2009 at 8:21 AM, Tharshini
> Thangavelu<thth4658@student.su.se> wrote:
>> Hi,
>>
>>
>> I have been working to figure out the problem to produce the correct mothers'and
>> fathers' age. I came across the following advise on the statalist.
>>
>> http://www.stata.com/statalist/archive/2006-06/msg00323.html
>>
>> However, my dataset seem a bit more strange: The following variables are used to
>> created mothers' and father's age. I still haven't produced the satisfactory
>> results.
>>
>> hhid      hv104     hv105        hv112     hv114    mage    fage
>>
>> 1 1         2         4             2         1      10       4
>> 1 1         1         10            2         1      10       4
>> 1 1         1         42            .         .
>> 1 1         2         36            .         .
>> 1 1         2         2             2         1       10      4
>> 1 2         1         28            .         .
>> 1 4         1         33            .         .
>> 1 5         2         24            .         .        .       .
>> 1 6         1         12            0         0        .       .
>>
>> The 0's in hv112 and hv114 denotes mother not in HH, father not in HH
respectively.
>>
>> Here are several things that I don't understand;
>> 1.) the hhid is the case identification where hh denotes the household number
>> and id the household member. I have only describes household no.1, there are 5
>> id for the first member in the household no. 1, followed by member no.2 and
>> member no. 4, 5 and 6. We don't have any id for = 3. Why is there in HHID 5
>> values for household member 1.
>>
>> 2.) I use the follwoing command to produce mage and fage
>>
>> .by hhid : gen mage = hv105[hv112]
>> .by hhid : gen fage = hv105[hv112]
>>
>> I created mothers' and fathers' age variable in household member data, directly
>> after uploaded it into stata. Therefore I have not merged this dataset, which I
>> will do in later state. I thought, creating parents' age in the original data
>> would be more of advantage than doing it in the merged data set. Although,
>> something tells me that it should produced the same results.
>>
>> You can see the result above table, that it cannot be a satisfactory results.
>> Mothers age cannot be 10yrs or father 4yrs for the first hhid.
>>
>> I used the solution proposed at the above link, but the command assert did not
>> work.  I don't understand what can be wrong!! If anyone have came across this
>> problem working with microlevel data, any help would be valuable!!
>>
>> Thanks alot!
>>
>>
>>
>>
>> ------>why are there 4 id for the first household member?
>>
>>
>>
>>
>> On 2009-07-29, at 01:45, Friedrich Huebler wrote:
>>> Tharshini,
>>>
>>> Please read the documentation for -merge- to understand how it works.
>>> Do not -drop- anything after -merge- besides the _merge variable. You
>>> have to keep all household members if you want to assign the parents'
>>> ages and other characteristics to a child. How to do that was
>>> explained in a previous post.
>>>
>>> http://www.stata.com/statalist/archive/2009-06/msg00793.html
>>>
>>> Friedrich
>>>
>>> On Tue, Jul 28, 2009 at 4:30 PM, Tharshini
>>> Thangavelu<thth4658@student.su.se> wrote:
>>>>
>>>> Friedrich,
>>>>
>>>> 1.)The program "select" was suggested by DHS at the FAQ section.. When I wanted
>>>> to upload the individual recode file, I couldn't because there was too many
>>>> variables. As a results, I used this program. You can find more info on their
>>>> website.
>>>>
>>>> 2.)I did as your suggestion. I uploaded the whole household member data, merged
>>>> it with weight file and I did NOT use the command keep, only drop command to
>>>> take away the _merge variable. Otherwise I cannot merge it with the individual
>>>> file. I tried and it gave me an error message: _merge already defined.
>>>>
>>>> So I drop the _merge variable form the resulting file (uppsats.dta). Then, I
>>>> write the following command:
>>>>
>>>> merge clnr hhnr lnr using ir
>>>>
>>>> variables clnr hhnr lnr do not uniquely identify observations in the master
data
>>>> caseid was str12 now str15
>>>>
>>>>
>>>> tab _merge
>>>>
>>>>     _merge |      Freq.     Percent        Cum.
>>>> ------------+-----------------------------------
>>>>          1 |     23,199       78.11       78.11
>>>>          2 |      3,100       10.44       88.55
>>>>          3 |      3,402       11.45      100.00
>>>> ------------+-----------------------------------
>>>>      Total |     29,701      100.00
>>>>
>>>>
>>>> Now comes a tricky part for me. Using the following commands, doesn't give me
>>>> the desired results.
>>>> keep if _merge==3
>>>> drop _merge
>>>>
>>>> This file, just at in the former case when tabulating hv105 (= age of household
>>>> member) gives exactly same answer, that is only children's age is included 0-5
>>>> years.
>>>>
>>>> But if I don't use the command keep or drop. I have the age of ALL household
>> member.
>>>>
>>>> My question is should I keep the "_merge" variable ? According to what I have
>>>> been reading, I thought the functioning of merge is to only keep if _merge ==3.
>>>>
>>>> 3.) In your former email you say that : I drop all children without height and
>>>> weight data and all adults, including parents. In my analysis, I use as
>>>> dependent variable child health measured by age for height Z-score and weight
>>>> for age Z-score. For those children having these Z-score, I need to match them
>>>> with their respective parents education, age and households characteristics
>>>> ect.to see if mothers' father's with higher education have children with better
>>>> child health measured bye Z-score. Therefore, shouldn't the way I was doing be
>>>> correct? Or I have misunderstood completely.
>>>>
>>>>
>>>> Thanks
>>>> Tharshini
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 2009-07-28, at 15:43, Friedrich Huebler wrote:
>>>>> Tharshini,
>>>>>
>>>>> In step 3 you -drop- all children without height and weight data and
>>>>> all adults, including all parents.
>>>>>
>>>>> You write "The household member data includes to many variables to
>>>>> directly upload in stata." The flat household member recode file from
>>>>> the Ghana DHS 2003 has 245 variables. The only version of Stata that
>>>>> cannot hold 245 variables is Small Stata. Your -tab- output indicates
>>>>> that you do not have Small Stata because you were able to work with
>>>>> more than 26000 observations (see -help limits-). You should therefore
>>>>> be able to open the complete household member file with Stata. I don't
>>>>> know a program called "select" but it does not seem to be necessary.
>>>>>
>>>>> Friedrich
>>>>>
>>>>> On Tue, Jul 28, 2009 at 2:54 AM, Tharshini
>>>>> Thangavelu<thth4658@student.su.se> wrote:
>>>>>> Hi Friedrich,
>>>>>>
>>>>>> When I downloaded the dataset for Ghana 2003, there was a doc.file in the
file
>>>>>> for height and weight. A describtion of how to processed when merging and
which
>>>>>> identifying variables to chose in each and every file. I followed this
doc.fil
>>>>>> I merged the file according to the following way;
>>>>>>
>>>>>> 1.) The height and weight file for children up to 5 years old.
>>>>>> rename HWHHID caseid
>>>>>> rename HWLINE linenr
>>>>>> sort caseid linenr
>>>>>> save weight, replace
>>>>>> clear exit
>>>>>>
>>>>>> 2.) The household member data includes to many variables to directly
upload in
>>>>>> stata, so I used the program "select", where I selected my variables of
>>>>>> interest. Then I uploaded in stata;
>>>>>>
>>>>>> use hmr1
>>>>>> rename hhid caseid
>>>>>> rename hvidx linenr
>>>>>> sort caseid linenr
>>>>>> save hmr1, replace
>>>>>>
>>>>>> 3.) These two files was then merged together (master data = hmr1)
>>>>>>
>>>>>> merge caseid linenr using weight
>>>>>>
>>>>>> tab _merge
>>>>>>
>>>>>>     _merge |      Freq.     Percent        Cum.
>>>>>> ------------+-----------------------------------
>>>>>>          1 |     22,673       85.23       85.23
>>>>>>          3 |      3,928       14.77      100.00
>>>>>> ------------+-----------------------------------
>>>>>>      Total |     26,601      100.00
>>>>>>
>>>>>> . keep if _merge ==3
>>>>>> (22673 observations deleted)
>>>>>>
>>>>>> . drop _merge
>>>>>>
>>>>>> Error message : linenr was byte now int
>>>>>>
>>>>>> My own conclusion: Since _merge 3 = 3928 observations which is exactly same
>>>>>> amount of obs. as in the weight file. I concluded the merging was correctly
>>>>>> made. I also tried with the inverse case, i.e. having hmr as my master data.
>>>>>>
>>>>>> 4.) With this resulting file, I merged it with the individual recode file
>>>>>> (=womens file). Cluster number (clnrhv001), householdnr (hhnr hv002) and
>>>>>> mothers' line nr (lnr hc60)
>>>>>>
>>>>>> In the resulting file, I again renamed the identifying variables
>>>>>> rename HV001 clnr
>>>>>> rename HV002 hhnr
>>>>>> rename hc60  lnr
>>>>>> sort clnr hhnr lnr
>>>>>> save thesis
>>>>>> clear exit
>>>>>>
>>>>>> 5.)In the individual recode file, just as in the household member recode
>> file, I
>>>>>> used the program "select" to chose the variables and the following
identifying
>>>>>> variables were renamed. Cluster number (clnr v001), Household number (hhnr
>> v002)
>>>>>> and Respondent's line number (lnr v003).
>>>>>>
>>>>>> use ir1
>>>>>> rename V001 clnr
>>>>>> rename V002 hhnr
>>>>>> rename V003 lnr
>>>>>> sort clnr hhnr lnr
>>>>>> save ir1, replace
>>>>>>
>>>>>> 6.)Now, I merge the ir1.dta with the thesis.dta
>>>>>>
>>>>>> merge clnr hhnr lnr using thesis
>>>>>> tab _merge
>>>>>>
>>>>>>     _merge |      Freq.     Percent        Cum.
>>>>>> ------------+-----------------------------------
>>>>>>          1 |        526        7.48        7.48
>>>>>>          2 |      3,100       44.11       51.59
>>>>>>          3 |      3,402       48.41      100.00
>>>>>> ------------+-----------------------------------
>>>>>>      Total |      7,028      100.00
>>>>>>
>>>>>> . keep if _merge == 3
>>>>>> (3626 observations deleted)
>>>>>>
>>>>>> . drop _merge
>>>>>>
>>>>>> Error message: variables clnr hhnr lnr do not uniquely identify
observations in
>>>>>> the master data. I hope this will help to solve the problem.
>>>>>>
>>>>>> / Tharshini
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2009-07-28, at 06:28, Friedrich Huebler wrote:
>>>>>>> Tharshini,
>>>>>>>
>>>>>>> On June 11 you wrote that you wanted to merge the household member
>>>>>>> file with the height and weight file. In response to your message you
>>>>>>> received advice on how you can merge the data. The table in your
>>>>>>> message of today makes clear that you did not merge the files
>>>>>>> correctly because you only have persons up to 5 years of age. If you
>>>>>>> want more help with this and the other problems you described you have
>>>>>>> to show us your code, as explained in the Statalist FAQ.
>>>>>>>
>>>>>>> http://www.stata.com/support/faqs/res/statalist.html#advice
>>>>>>>
>>>>>>> Friedrich
>>>>>>>
>>>>>>> On Mon, Jul 27, 2009 at 9:42 AM, Tharshini
>>>>>>> Thangavelu<thth4658@student.su.se> wrote:
>>>>>>>>
>>>>>>>> .tab hv105
>>>>>>>>   Age of |
>>>>>>>>  household |
>>>>>>>>    members |      Freq.     Percent        Cum.
>>>>>>>> ------------+-----------------------------------
>>>>>>>>          0 |        772       22.69       22.69
>>>>>>>>          1 |        706       20.75       43.45
>>>>>>>>          2 |        655       19.25       62.70
>>>>>>>>          3 |        689       20.25       82.95
>>>>>>>>          4 |        553       16.26       99.21
>>>>>>>>          5 |         27        0.79      100.00
>>>>>>>> ------------+-----------------------------------
>>>>>>>>      Total |      3,402      100.00
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

-- 
Tharshini THANGAVELU
Forskarbacken 8 / 101
114 16 Stockholm
Sweden
Phone +46 (0)735 53 43 90
E-mail thth4658@student.su.se

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index