Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: How blanks are treated when vars are read in as string froman ASCII raw data file


From   Ian Watson <ian.watson@tpg.com.au>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: How blanks are treated when vars are read in as string froman ASCII raw data file
Date   Thu, 24 Nov 2005 18:41:34 +1100

Clare

I've reproduced your problem and compared it with yesterday's solution and can only come up with one suggestion.

When you infile the string and then split it, using the substr function, the right hand component (which I called num in yesterday's post) has leading blanks on it. These are then replaced by leading 0s using the subinstr function.

However, when you infile the string as two strings, Stata possibly strips the leading blanks from it. Even though it has the designation of a str3 type, it may not have the same "contents" as num did (which was also a str3 type) because that latter was created from substr. That
is " 23" and "23" look the same on the screen, but they're not the same data.

This is only a guess, and I can't find an easy way to test it. But it suggests you're better off reading your string in as a full string, then splitting it, rather than as two strings. At least that works (even if the reason is not altogether clear to me why).

--
Kind regards,
Ian

-------------------------------
Ian Watson
Senior Researcher
acirrt, University of Sydney
NSW, 2006, Australia

phone: 02 9351 5622
email:i.watson@econ.usyd.edu.au
www.acirrt.com
-------------------------------




Clare L Maxwell wrote:

Hello again to the Statlist writers.

This is a reprise of a problem I asked about yesterday, involving a 6-place string subject identifier that is read in from an ASCII file. In the ASCII file, this is how the data look:

B 1
B 120
CCH 7
CCH 23
CCH213
UW 23
UW 232

I want to turn them into a variable that looks like this:

B001
B120
CCH007
CCH023
CCH213
UW023
UW232

I had two very helpful recommendations on how to accomplish this. However, the results were not as hoped.

I read the data into Stata as two str3 variables:

infile dictionary using "/Projects/Christina/HNS/Stata/data/cm1.raw" {
str3 site %3s
str3 siteid %3s
...... many more vars
}

I was surprised when they looked like this in Stata:

site siteid
B 1
B 120
CCH 7
CCH 23
CCH 213
UW 23
UW 232

In other words, now they both appeared as right justified once they had been read in (siteid was that way before in the ASCII file).

This saved me the trouble of manipulating site. However, when I tried to left pad siteid with zeros, as suggested by Ian Watson and Nick Cox:

replace siteid = subinstr(siteid," ","0",.)

I got the message that zero replacements have been made. When I concatenated the site and siteid to get my final ID variable, I got the following:

B1
B120
CCH7
CCH23
CCH213
UW23
UW232

This is similar to problems I was having before my original letter. There must be something I do not understand how blanks are treated in string variables created this way. In the ASCII file, these variables are not enclosed in quotation marks to signify that they are string variables. I just say string and how long when they are read in. I later merge three files by these IDs plus dates, and in order to do that, I have to compress the files. I thought compression might have been the problem. Sadly, not compressing did not resolve this. I also just updated Stata. No change. Can anyone shed more light?

Thank you very much for your help.

Yours truly,
Clare Maxwell
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index