Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: How blanks are treated when vars are read in as string from an ASCII raw data file


From   Sergio Correia <[email protected]>
To   [email protected]
Subject   Re: st: How blanks are treated when vars are read in as string from an ASCII raw data file
Date   Thu, 24 Nov 2005 03:44:18 -0500

I agree with Nick's suggestion.

I'm not 100% sure but I think this would be easier:
  gen asd=site+substr("00"+siteid,-3,3)

If there are embedded  blanks you can just do -trim(siteid)- and
-trim(site)- (see help strfun)


Sergio


On 11/24/05, Richard Palmer-Jones <[email protected]> wrote:
> * if your variables are strvar1 & strvar2
> gen str6 newvar = ""
> qui des
> local N = r(N)
> forval i = 1/`N' {
>     local t1 = strvar1[`i']
>
>     local t2 = strvar2[`i']
>     local leng2 = length("`t2'")
>     local dum2 = ""
>     if `leng2' == 1 {
>         local dum2 = "00"
>     }
>     else if `leng2' == 2 {
>         local dum2 = "0"
>     }
>     local var = "`t1'`dum2'`t2'"
>     replace newvar = "`var'" in `i'
> }
> list
>
>     | strvar1   strvar2   newvar |
>      |----------------------------|
>   1. |       B         1     B001 |
>   2. |       B       120     B120 |
>   3. |     CCH         7   CCH007 |
>   4. |     CCH        23   CCH023 |
>   5. |     CCH       213   CCH213 |
>      |----------------------------|
>   6. |      UW        23    UW023 |
>   7. |      UW       232    UW232 |
>
> Richard
>
> On 11/24/05, Ian Watson <[email protected]> wrote:
> > Clare
> >
> > I've reproduced your problem and compared it with yesterday's solution
> > and can only come up with one suggestion.
> >
> > When you infile the string and then split it, using the substr function,
> > the right hand component (which I called num in yesterday's post) has
> > leading blanks on it. These are then replaced by leading 0s using the
> > subinstr function.
> >
> > However, when you infile the string as two strings, Stata possibly
> > strips the leading blanks from it. Even though it has the designation of
> > a str3 type, it may not have the same "contents" as num did (which was
> > also a str3 type) because that latter was created from substr. That
> > is " 23" and "23" look the same on the screen, but they're not the same
> > data.
> >
> > This is only a guess, and I can't find an easy way to test it. But it
> > suggests you're better off reading your string in as a full string, then
> > splitting it, rather than as two strings. At least that works (even if
> > the reason is not altogether clear to me why).
> >
> > --
> > Kind regards,
> > Ian
> >
> > -------------------------------
> > Ian Watson
> > Senior Researcher
> > acirrt, University of Sydney
> > NSW, 2006, Australia
> >
> > phone: 02 9351 5622
> > email:[email protected]
> > www.acirrt.com
> > -------------------------------
> >
> >
> >
> >
> > Clare L Maxwell wrote:
> > > Hello again to the Statlist writers.
> > >
> > > This is a reprise of a problem I asked about yesterday, involving a
> > > 6-place string subject identifier that is read in from an ASCII file. In
> > > the ASCII file, this is how the data look:
> > >
> > > B    1
> > > B  120
> > > CCH  7
> > > CCH 23
> > > CCH213
> > > UW  23
> > > UW 232
> > >
> > > I want to turn them into a variable that looks like this:
> > >
> > >   B001
> > >   B120
> > > CCH007
> > > CCH023
> > > CCH213
> > >  UW023
> > >  UW232
> > >
> > > I had two very helpful recommendations on how to accomplish this.
> > > However, the results were not as hoped.
> > >
> > > I read the data into Stata as two str3 variables:
> > >
> > > infile dictionary using "/Projects/Christina/HNS/Stata/data/cm1.raw" {
> > >   str3       site         %3s
> > >   str3       siteid       %3s
> > > ...... many more vars
> > >   }
> > >
> > > I was surprised when they looked like this in Stata:
> > >
> > > site siteid
> > >    B      1
> > >    B    120
> > >  CCH      7
> > >  CCH     23
> > >  CCH    213
> > >   UW     23
> > >   UW    232
> > >
> > > In other words, now they both appeared as right justified once they had
> > > been read in (siteid was that way before in the ASCII file).
> > >
> > > This saved me the trouble of manipulating site.  However, when I tried
> > > to left pad siteid with zeros, as suggested by Ian Watson and Nick Cox:
> > >
> > > replace siteid = subinstr(siteid," ","0",.)
> > >
> > > I got the message that zero replacements have been made.  When I
> > > concatenated the site and siteid to get my final ID variable, I got the
> > > following:
> > >
> > >     B1
> > >   B120
> > >   CCH7
> > >  CCH23
> > > CCH213
> > >   UW23
> > >  UW232
> > >
> > > This is similar to problems I was having before my original letter.
> > > There must be something I do not understand how blanks are treated in
> > > string variables created this way.  In the ASCII file, these variables
> > > are not enclosed in quotation marks to signify that they are string
> > > variables.  I just say string and how long when they are read in.  I
> > > later merge three files by these IDs plus dates, and in order to do
> > > that, I have to compress the files.  I thought compression might have
> > > been the problem.  Sadly, not compressing did not resolve this.  I also
> > > just updated Stata.  No change.  Can anyone shed more light?
> > >
> > > Thank you very much for your help.
> > >
> > >                 Yours truly,
> > >                 Clare Maxwell
> > > *
> > > *   For searches and help try:
> > > *   http://www.stata.com/support/faqs/res/findit.html
> > > *   http://www.stata.com/support/statalist/faq
> > > *   http://www.ats.ucla.edu/stat/stata/
> > >
> > >
> > >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> >
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index