Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: String problem.

From   "Steichen, Thomas J." <>
To   <>
Subject   RE: st: String problem.
Date   Wed, 7 Sep 2005 12:15:24 -0400

Part of the problem is that Stata chooses not to include leading blanks when creating string variables that are shorter than what
the format specifies.  Thus if s is formatted str3 and 12 is a value of one observation of s, it is really "12" rather than " 12".
So, to get the appropriate number of spaces between two concatenated strings, you will need to pad out each string to its full
length with blanks. For your example:

  . tostring hv001, gen(h1)
  . tostring hv002, gen(h2)
  . replace h1=" " + h1 if length(h1) == 1
  . replace h2=" " + h2 if length(h2) == 1
  . replace h2=" " + h2 if length(h2) == 2

Then concatenate:

  . gen str12 hhid2 = h1 + h2

I can conceive of no way, though, to programmatically cope with the fact that the surveys are inconsistent with regard to leading
and trailing blanks.  Unless you can come up with a rule that defines this practice, you are stuck!


Thomas J. Steichen
  Facts do not cease to exist because they are ignored. - Aldous Huxley

> -----Original Message-----
> From: 
> [] On Behalf Of 
> Sent: Wednesday, September 07, 2005 11:49 AM
> To:
> Subject: RE: st: String problem.
> Hi,
> Indeed, it is the case and Uli's suggestion is not working.
> In fact, the problem is deeper than that.
> I have several Demographic and Health Surveys (DHS) data 
> sets. They come in different flavors (households, women, 
> children, etc...) I need to merge households and women 
> (individuals) files. I go on as indicated by the DHS team 
> (see below and also a previous thread on sustr() I posted 
> last week, to which Nick graciously responded).
> But unfortunatelly, for one country, there is no hhid in the 
> household file. But there is a caseid in the women file for 
> that country. I need thus to generate a hhid (based on 
> cluster and household id) in order to be able to merge the 
> household data with individual data (in which I can retrieve 
> the hhid variable from the caseid one using substr()). The 
> caseid in the individual files contains hhid information. The 
> hhid is always 12 characters long, with sometimes leading and 
> trainling blanks spaces (that worses the case).
> I just can't figure :
> - how the DHS team constructs the hhid variable (i.e. with 
> leading and trailing blanks and sometimes blanks within)
> - how to retrieve it in stata.
> One solution of course is to build hhid in household and 
> individual files using cluster and household ID variables (g 
> hhid = hv001*100+hv002). But I am wondering if one could 
> achieve creating the hhid variable exactely in the DHS 
> tradition (ie with 12 characters long). Because, ultimately, 
> I need to pool my data and a uniformity is desirable. Said 
> differently, why is it so difficult to create a second hhid2 
> (without using
> clonevars)
> that is exactly the same as hhid?
> I hope this is clear enough.
> Best regards.
> Amadou.

CONFIDENTIALITY NOTE: This e-mail message, including any attachment(s),
contains information that may be confidential, protected by the
attorney-client or other legal privileges, and/or proprietary non-
public information. If you are not an intended recipient of this
message or an authorized assistant to an intended recipient, please
notify the sender by replying to this message and then delete it from
your system. Use, dissemination, distribution, or reproduction of this
message and/or any of its attachments (if any) by unintended recipients
is not authorized and may be unlawful.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index