Indeed, it is the case and Uli's suggestion is not working.
In fact, the problem is deeper than that.
I have several Demographic and Health Surveys (DHS) data sets.
They come in different flavors (households, women, children, etc...)
I need to merge households and women (individuals) files.
I go on as indicated by the DHS team (see below and also a previous
thread on sustr() I posted last week, to which Nick graciously responded).
But unfortunatelly, for one country, there is no hhid in the household file.
But there is a caseid in the women file for that country.
I need thus to generate a hhid (based on cluster and household id)
in order to be able to merge the household data with individual data
(in which I can retrieve the hhid variable from the caseid one using substr()).
The caseid in the individual files contains hhid information.
The hhid is always 12 characters long, with sometimes leading and trainling
blanks spaces (that worses the case).
I just can't figure :
- how the DHS team constructs the hhid variable (i.e. with leading and trailing
and sometimes blanks within)
- how to retrieve it in stata.
One solution of course is to build hhid in household and individual files using
cluster and household ID variables (g hhid = hv001*100+hv002).
But I am wondering if one could achieve creating the hhid variable exactely in
the DHS tradition
(ie with 12 characters long). Because, ultimately, I need to pool my data and a
Said differently, why is it so difficult to create a second hhid2 (without using
that is exactly the same as hhid?
I hope this is clear enough.
Q: How do I merge household, women, men, and wealth DHS data files?
A: When merging women?s and men?s data files with their households, you
need to use the cluster and household numbers. Since there is a
?one-to-many? relationship between households and individuals, you should
start with the individual data, women or men, as your ?base? (or ?unit of
analysis?) and locate the correct household for each person.
In the household data, the cluster number is stored in HV001, and the
household is in HV002. In the women?s data, the cluster is V001, the
household is V002. In the men?s data, the equivalent variables are MV001
Another alternative is to use the household and individual case ID
variables. The household ID is HHID, and is 12 characters long. In
general, this variable will consist of the cluster and household numbers,
but there are exceptions (see notes below). The case ID variable for
women is CASEID, which is 15 characters long. It consists of the
household ID for that person, with the person?s 3-digit line number
appended to the end. So in SPSS, you could create a variable TMPID:
COMPUTE TMPID = SUBSTR(CASEID,1,12).
Then you would use TMPID to match with HHID in the household.
The wealth index is computed for households, not for individuals.
Therefore, the case ID for these files, WHHID, is the equivalent to the
household ID. To merge the household and wealth index files, set the two
ID variables (HHID and WHHID) equal. To merge an individual data file
with the wealth index, follow the same procedure as described above for
merging individual and household data, but instead of HHID, use WHHID.
When merging files from the India NFHS-2 survey, you can still use HHID,
WHHID, and CASEID as described above. But if you want to use the
individual variables for cluster and household number, you must also use
the State variables (HV024 in the households, and V024 in the women?s
files). When matching the households or individuals to the village data,
you must also use the village number (since there can be more than one
village in a cluster), which are SHLOCAL in the households, and SLOCAL in
the women?s files. The village number variable in the village data is
<email@example.com> To: <firstname.lastname@example.org>
Sent by: cc:
owner-statalist@hsphsun2. Subject: RE: st: String problem.
09/07/2005 11:23 AM
Please respond to
This will not work for the problem stated,
as the digit "0" can occur as part of
> Maybe something along the line of:
> . gen hhid2 = hv001*100 + hv002
> . tostring hhid2, replace
> . replace hhid2 = subinstr(hhid2,"0"," ",.)
> email@example.com wrote:
> > I have problem generating a hid variable.
> > I want to reproduce an exact clone of my
> > hhid. It is 12 characters long.
* For searches and help try:
* For searches and help try: