Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: How blanks are treated when vars are read in as string from an ASCII raw data file


From   Richard Palmer-Jones <richard.palmerjones@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: How blanks are treated when vars are read in as string from an ASCII raw data file
Date   Thu, 24 Nov 2005 13:30:46 +0500

* if your variables are strvar1 & strvar2
gen str6 newvar = ""
qui des
local N = r(N)
forval i = 1/`N' {
    local t1 = strvar1[`i']

    local t2 = strvar2[`i']
    local leng2 = length("`t2'")
    local dum2 = ""
    if `leng2' == 1 {
        local dum2 = "00"
    }
    else if `leng2' == 2 {
        local dum2 = "0"
    }
    local var = "`t1'`dum2'`t2'"
    replace newvar = "`var'" in `i'
}
list

    | strvar1   strvar2   newvar |
     |----------------------------|
  1. |       B         1     B001 |
  2. |       B       120     B120 |
  3. |     CCH         7   CCH007 |
  4. |     CCH        23   CCH023 |
  5. |     CCH       213   CCH213 |
     |----------------------------|
  6. |      UW        23    UW023 |
  7. |      UW       232    UW232 |

Richard

On 11/24/05, Ian Watson <ian.watson@tpg.com.au> wrote:
> Clare
>
> I've reproduced your problem and compared it with yesterday's solution
> and can only come up with one suggestion.
>
> When you infile the string and then split it, using the substr function,
> the right hand component (which I called num in yesterday's post) has
> leading blanks on it. These are then replaced by leading 0s using the
> subinstr function.
>
> However, when you infile the string as two strings, Stata possibly
> strips the leading blanks from it. Even though it has the designation of
> a str3 type, it may not have the same "contents" as num did (which was
> also a str3 type) because that latter was created from substr. That
> is " 23" and "23" look the same on the screen, but they're not the same
> data.
>
> This is only a guess, and I can't find an easy way to test it. But it
> suggests you're better off reading your string in as a full string, then
> splitting it, rather than as two strings. At least that works (even if
> the reason is not altogether clear to me why).
>
> --
> Kind regards,
> Ian
>
> -------------------------------
> Ian Watson
> Senior Researcher
> acirrt, University of Sydney
> NSW, 2006, Australia
>
> phone: 02 9351 5622
> email:i.watson@econ.usyd.edu.au
> www.acirrt.com
> -------------------------------
>
>
>
>
> Clare L Maxwell wrote:
> > Hello again to the Statlist writers.
> >
> > This is a reprise of a problem I asked about yesterday, involving a
> > 6-place string subject identifier that is read in from an ASCII file. In
> > the ASCII file, this is how the data look:
> >
> > B    1
> > B  120
> > CCH  7
> > CCH 23
> > CCH213
> > UW  23
> > UW 232
> >
> > I want to turn them into a variable that looks like this:
> >
> >   B001
> >   B120
> > CCH007
> > CCH023
> > CCH213
> >  UW023
> >  UW232
> >
> > I had two very helpful recommendations on how to accomplish this.
> > However, the results were not as hoped.
> >
> > I read the data into Stata as two str3 variables:
> >
> > infile dictionary using "/Projects/Christina/HNS/Stata/data/cm1.raw" {
> >   str3       site         %3s
> >   str3       siteid       %3s
> > ...... many more vars
> >   }
> >
> > I was surprised when they looked like this in Stata:
> >
> > site siteid
> >    B      1
> >    B    120
> >  CCH      7
> >  CCH     23
> >  CCH    213
> >   UW     23
> >   UW    232
> >
> > In other words, now they both appeared as right justified once they had
> > been read in (siteid was that way before in the ASCII file).
> >
> > This saved me the trouble of manipulating site.  However, when I tried
> > to left pad siteid with zeros, as suggested by Ian Watson and Nick Cox:
> >
> > replace siteid = subinstr(siteid," ","0",.)
> >
> > I got the message that zero replacements have been made.  When I
> > concatenated the site and siteid to get my final ID variable, I got the
> > following:
> >
> >     B1
> >   B120
> >   CCH7
> >  CCH23
> > CCH213
> >   UW23
> >  UW232
> >
> > This is similar to problems I was having before my original letter.
> > There must be something I do not understand how blanks are treated in
> > string variables created this way.  In the ASCII file, these variables
> > are not enclosed in quotation marks to signify that they are string
> > variables.  I just say string and how long when they are read in.  I
> > later merge three files by these IDs plus dates, and in order to do
> > that, I have to compress the files.  I thought compression might have
> > been the problem.  Sadly, not compressing did not resolve this.  I also
> > just updated Stata.  No change.  Can anyone shed more light?
> >
> > Thank you very much for your help.
> >
> >                 Yours truly,
> >                 Clare Maxwell
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> >
> >
> >
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index