Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: How blanks are treated when vars are read in as string from anASCII raw data file


From   Clare L Maxwell <maxwellcl1@earthlink.net>
To   statalist@hsphsun2.harvard.edu
Subject   st: How blanks are treated when vars are read in as string from anASCII raw data file
Date   Wed, 23 Nov 2005 20:09:51 -0600

Hello again to the Statlist writers.

This is a reprise of a problem I asked about yesterday, involving a 6-place string subject identifier that is read in from an ASCII file. In the ASCII file, this is how the data look:

B 1
B 120
CCH 7
CCH 23
CCH213
UW 23
UW 232

I want to turn them into a variable that looks like this:

B001
B120
CCH007
CCH023
CCH213
UW023
UW232

I had two very helpful recommendations on how to accomplish this. However, the results were not as hoped.

I read the data into Stata as two str3 variables:

infile dictionary using "/Projects/Christina/HNS/Stata/data/cm1.raw" {
str3 site %3s
str3 siteid %3s
...... many more vars
}

I was surprised when they looked like this in Stata:

site siteid
B 1
B 120
CCH 7
CCH 23
CCH 213
UW 23
UW 232

In other words, now they both appeared as right justified once they had been read in (siteid was that way before in the ASCII file).

This saved me the trouble of manipulating site. However, when I tried to left pad siteid with zeros, as suggested by Ian Watson and Nick Cox:

replace siteid = subinstr(siteid," ","0",.)

I got the message that zero replacements have been made. When I concatenated the site and siteid to get my final ID variable, I got the following:

B1
B120
CCH7
CCH23
CCH213
UW23
UW232

This is similar to problems I was having before my original letter. There must be something I do not understand how blanks are treated in string variables created this way. In the ASCII file, these variables are not enclosed in quotation marks to signify that they are string variables. I just say string and how long when they are read in. I later merge three files by these IDs plus dates, and in order to do that, I have to compress the files. I thought compression might have been the problem. Sadly, not compressing did not resolve this. I also just updated Stata. No change. Can anyone shed more light?

Thank you very much for your help.

Yours truly,
Clare Maxwell
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index