Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Friedrich Huebler <fhuebler@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: merging the newly created height and weight DHS data sets with child data set |
Date | Thu, 22 Aug 2013 11:38:32 -0400 |
Donela, The variables v001, v002 and b16 do not uniquely identify observations in the child data from the Ethiopia DHS 2005. . duplicates report v001 v002 b16 Duplicates in terms of v001 v002 b16 -------------------------------------- copies | observations surplus ----------+--------------------------- 1 | 9672 0 2 | 148 74 3 | 21 14 4 | 20 15 -------------------------------------- Some children are not included in the household member file and therefore have the value 0 or a missing value in b16. You can correct this with the commands below. . drop if b16==0 | b16==. (1006 observations deleted) . duplicates report v001 v002 b16 Duplicates in terms of v001 v002 b16 -------------------------------------- copies | observations surplus ----------+--------------------------- 1 | 8855 0 -------------------------------------- Friedrich On Thu, Aug 22, 2013 at 10:43 AM, Eric A. Booth <eric.a.booth@gmail.com> wrote: > <> > > Try opening each of the files and then seeing if the variables you are > merging are uniquely identify the records (we already can guess that > they do not, but now you want to find out why). You'll have to > investigate your data and make the call about how you deal with your > merge variables. > > If there are many records per identifier (or variables that together > uniquely identify the records you are merging) in the 'using' > datafile, determine: do you want to merge all of them in? do you > want to aggregate/collapse them first in some way ? do you want to > reshape them from long to wide before merging them in ? > > All of this requires fully understanding your data structure based on > examining the data and the codebook and the extensive documentation > that DHS provides (I'm not familiar with these, but I've seen the > website and there is a lot there on this topic). > > To check if the variables you are merging on are unique identifiers, > start with commands like -isid- and -duplicates tag- and then > investigate the duplicates to figure out what you want to do with them > for the merge. > > Also, there is a lot of information on merging DHS data in Stata on > the internet, e.g., > > http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial/example9 > http://userforum.measuredhs.com/index.php?t=rview&goto=217&th=49 > http://statalist.1588530.n2.nabble.com/st-DHS-Ghana-merging-question-td3128338.html > http://www.stata.com/statalist/archive/2009-11/msg00240.html > > - Eric > > > > > On Thu, Aug 22, 2013 at 9:26 AM, Donela Besada <dbesada85@gmail.com> wrote: >> Hi Nick, >> >> Thanks for the response. To be honest, I am not sure, but I am >> assuming that they have to be, since they are picking the same >> variables in each of the data sets to merge on, they are just called >> something different in each data set. >> >> I don't really know what other options to try. I am trying to follow >> the instructions, but they are not very clear: >> >> I have pasted here the full instructions: >> >> HEIGHT AND WEIGHT – WHO CHILD GROWTH STANDARD FILES >> >> The Height and Weight – WHO Child Growth Standard files contain the >> standard deviations for the height for age, weight for age, weight for >> height, and BMI according to the new WHO definition. These data are >> available in the standard distributed DHS-V data files but not for >> previous recode definitions. The new WHO scores for recode >> definitions 4 and below need to be merged with the corresponding >> standard recode files for analysis purposes. >> >> In the early phases of DHS, the Height and Weight data was collected >> for children of interviewed women; but in the last two rounds the data >> were collected for all children in the households. Variable HWLEVEL >> in this file indicates whether the anthropometry data was collected at >> the household or the woman’s level. Code 1 in that variable indicates >> that height and weight was collected at the household level and code 2 >> indicates that it was collected at the woman’s level. >> >> The Height and Weight data collected for children of interviewed women >> can be merged with either their mothers or the children themselves as >> follows: >> >> Use HWCASEID from the Height and Weight file with CASEID from the >> Individual Recode to merge it with the mother’s data. >> >> Use HWCASEID and HWLINE, from the Height and Weight file, with CASEID >> and MIDX, from the Children's recode file to merge it with the >> Children’s data. >> >> Use HWCASEID and HWLINE, from the Height and Weight file, with CASEID >> and BIDX, from the Births Recode file to merge it with the Births’ >> data. >> >> The Height and Weight data collected for children at the household >> level can be matched to the households, to the members, to the >> mothers, or to the children. >> >> Use HWHHID from the Height and Weight data file with HHID from the >> Household Recode file to merge it with the household data where the >> child was measured. >> >> Use HWHHID and HWLINE from the Height and Weight file with HHID and >> HVIDX from the Members Recode file to merge it with the household >> member data. >> >> Once the Height and weight data are merged to the household members’ >> file, the resulting file could then be merged with the mother’s and >> children’s file, as follows: >> >> Use HV001 (cluster number) plus HV002 (household number) and HC60 >> (mother’s line number) from the constructed file and merge it with the >> corresponding V001, V002 and V003 from the Individual Recode file. >> >> Use HV001 (cluster number) plus HV002 (household number) and HWLINE >> from the member’s constructed file (or the one resulting from the >> previous merge), and merge it with the corresponding V001, V002, B16 >> (child’s line number in the household) in the Children Recode file or >> in the Births Recode file. >> >> >> On Thu, Aug 22, 2013 at 4:14 PM, Nick Cox <njcoxstata@gmail.com> wrote: >>> The devil is in the details. This is not my field at all but are _all_ >>> these -merge-s really 1:1? >>> Nick >>> njcoxstata@gmail.com >>> >>> >>> On 22 August 2013 15:10, Donela Besada <dbesada85@gmail.com> wrote: >>>> Hello, thank you for your response. >>>> >>>> So this is what I did: >>>> I first opened the height and weight data set and renamed the two >>>> variables I need to merge on so that they correspond to the name in >>>> the household member data set >>>> >>>> gen hhid=hwhhid >>>> gen hvidx=hwline >>>> >>>> Then I opened the household member data set and I did merge it with >>>> the anthropometric data set >>>> Type of merge: one to one on key variables >>>> >>>> merge 1:1 hvidx hhid using >>>> "/Users/donelabesada/Desktop/IHSS/Ethiopia/ETHW51DT_2005 >>>> anthro/ETHW51FL.DTA" >>>> >>>> Result # of obs. >>>> ----------------------------------------- >>>> not matched 62,591 >>>> from master 62,591 (_merge==1) >>>> from using 0 (_merge==2) >>>> >>>> matched 4,949 (_merge==3) >>>> ----------------------------------------- >>>> >>>> >>>> After that I renamed the variables I am instructed to merge on to >>>> reflect the same variables in the child data set: >>>> >>>> rename hv001 v001 >>>> rename hv002 v002 >>>> rename hvidx b16 >>>> >>>> I then saved that file and opened the child data set. I then tried to >>>> merge this child data set with my newly created merged file-again one >>>> to one on key variables >>>> merge 1:1 v001 v002 b16 using >>>> "/Users/donelabesada/Desktop/ETPR51FL_householdmember_anthro_merge.dta" >>>> >>>> When I did this I got the below error: >>>> >>>> variables v001 v002 b16 do not uniquely identify observations in the master data >>>> >>>> I am using Ethiopia 2005 data for this merging. >>>> >>>> I would appreciate any help anyone can offer. >>>> >>>> Thank you very much. >>>> >>>> Warm wishes, >>>> Donela >>>> >>>> On Thu, Aug 22, 2013 at 4:01 PM, Nick Cox <njcoxstata@gmail.com> wrote: >>>>> Please show the -merge- command you typed to increase the chance of a >>>>> good answer. >>>>> Nick >>>>> njcoxstata@gmail.com >>>>> >>>>> >>>>> On 22 August 2013 14:55, Donela Besada <dbesada85@gmail.com> wrote: >>>>>> Hello, >>>>>> >>>>>> I was wondering if anyone could help me. I am trying to follow the WHO >>>>>> instructions on how to merge the new height and age data sets with the >>>>>> original child data sets. I am able to successfully merge the height >>>>>> and weight data with the household member data. I am having some >>>>>> trouble with the second step of merging that data set to the child >>>>>> data set. When I try to merge on the variables: v001, v002 and b16, I >>>>>> get the following error: >>>>>> >>>>>> "variables v001 v002 b16 do not uniquely identify observations in the >>>>>> master data" >>>>>> >>>>>> Has anyone successfully done this and could you please help if so? >>>>>> >>>>>> >>>>>> Thank you very much >>>>>> Donela >>>>>> >>>>>> WHO instructions: >>>>>> >>>>>> The Height and Weight data collected for children at the household >>>>>> level can be matched to the households, to the members, to the >>>>>> mothers, or to the children. >>>>>> >>>>>> Use HWHHID and HWLINE from the Height and Weight file with HHID and >>>>>> HVIDX from the Members Recode file to merge it with the household >>>>>> member data. >>>>>> >>>>>> Once the Height and weight data are merged to the household members’ >>>>>> file, the resulting file could then be merged with the mother’s and >>>>>> children’s file, as follows: >>>>>> >>>>>> Use HV001 (cluster number) plus HV002 (household number) and HWLINE >>>>>> from the member’s constructed file (or the one resulting from the >>>>>> previous merge), and merge it with the corresponding V001, V002, B16 >>>>>> (child’s line number in the household) in the Children Recode file or >>>>>> in the Births Recode file. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/