Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: String problem.

Subject   RE: st: String problem.
Date   Wed, 7 Sep 2005 11:49:29 -0400

Indeed, it is the case and Uli's suggestion is not working.
In fact, the problem is deeper than that.
I have several Demographic and Health Surveys (DHS) data sets.
They come in different flavors (households, women, children, etc...)
I need to merge households and women (individuals) files.
I go on as indicated by the DHS team (see below and also a previous
thread on sustr() I posted last week, to which Nick graciously responded).

But unfortunatelly, for one country, there is no hhid in the household file.
But there is a caseid in the women file for that country.
I need thus to generate a hhid (based on cluster and household id)
in order to be able to merge the household data with individual data
(in which I can retrieve the hhid variable from the caseid one using substr()).
The caseid in the individual files contains hhid information.
The hhid is always 12 characters long, with sometimes leading and trainling
blanks spaces (that worses the case).

I just can't figure :
- how the DHS team constructs the hhid variable (i.e. with leading and trailing
and sometimes blanks within)
- how to retrieve it in stata.

One solution of course is to build hhid in household and individual files using
cluster and household ID variables (g hhid = hv001*100+hv002).
But I am wondering if one could achieve creating the hhid variable exactely in
the DHS tradition
(ie with 12 characters long). Because, ultimately, I need to pool my data and a
uniformity is
Said differently, why is it so difficult to create a second hhid2 (without using
that is exactly the same as hhid?

I hope this is clear enough.

Best regards.

       Q: How do I merge household, women, men, and wealth DHS data files?       
       A: When merging women?s and men?s data files with their households, you   
       need to use the cluster and household numbers. Since there is a           
       ?one-to-many? relationship between households and individuals, you should 
       start with the individual data, women or men, as your ?base? (or ?unit of 
       analysis?) and locate the correct household for each person.              
       In the household data, the cluster number is stored in HV001, and the     
       household is in HV002. In the women?s data, the cluster is V001, the      
       household is V002. In the men?s data, the equivalent variables are MV001  
       and MV002.                                                                
       Another alternative is to use the household and individual case ID        
       variables. The household ID is HHID, and is 12 characters long. In        
       general, this variable will consist of the cluster and household numbers, 
       but there are exceptions (see notes below). The case ID variable for      
       women is CASEID, which is 15 characters long. It consists of the          
       household ID for that person, with the person?s 3-digit line number       
       appended to the end. So in SPSS, you could create a variable TMPID:       
       COMPUTE TMPID = SUBSTR(CASEID,1,12).                                      
       Then you would use TMPID to match with HHID in the household.             
       The wealth index is computed for households, not for individuals.         
       Therefore, the case ID for these files, WHHID, is the equivalent to the   
       household ID. To merge the household and wealth index files, set the two  
       ID variables (HHID and WHHID) equal. To merge an individual data file     
       with the wealth index, follow the same procedure as described above for   
       merging individual and household data, but instead of HHID, use WHHID.    
       When merging files from the India NFHS-2 survey, you can still use HHID,  
       WHHID, and CASEID as described above. But if you want to use the          
       individual variables for cluster and household number, you must also use  
       the State variables (HV024 in the households, and V024 in the women?s     
       files). When matching the households or individuals to the village data,  
       you must also use the village number (since there can be more than one    
       village in a cluster), which are SHLOCAL in the households, and SLOCAL in 
       the women?s files. The village number variable in the village data is     

                      "Nick Cox"                                                                                                               
                      <>           To:       <>                                              
                      Sent by:                         cc:                                                                                     
                      owner-statalist@hsphsun2.        Subject:  RE: st: String problem.                                                       
                      09/07/2005 11:23 AM                                                                                                      
                      Please respond to                                                                                                        

This will not work for the problem stated,
as the digit "0" can occur as part of
an identifier.


Ulrich Kohler

> Maybe something along the line of:
> . gen hhid2 = hv001*100 + hv002
> . tostring hhid2, replace
> . replace hhid2 = subinstr(hhid2,"0"," ",.)

> wrote:

> > I have problem generating a hid variable.
> > I want to reproduce an exact clone of my
> > hhid. It is 12 characters long.

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index