Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: trim spaces in postcode

From   Joseph Coveney <>
To   Statalist <>
Subject   Re: st: trim spaces in postcode
Date   Mon, 03 Apr 2006 17:02:16 +0900

Ronnie Babigumira wrote:

Yes I have used and continue to use -insheet- (mainly if I have tab
delimited data from excel) and specifying the string length is not a problem
with -insheet-. That said, there are situations where -infile- is the
appropriate command and of course -input- is invaluable when I want to input
a few entries. In this case I have to specify the length of string variables

Ada Ma wrote:

have you tried -insheet-??

Is there something in newer versions of Stata that would save me from
guessing the length of strings when using -infile-
and -input-


For -infile-:

If your input file is space-delimited (that is, spaces aren't used to
represent missing values and there aren't internal spaces in strings), then
you can use -split- after -infix str v1 1-244 using <filename>- for record
lengths up to 244 bytes.  You can then -destring- to restore numeric

In cases of multispace-delimited files (typically used, for example, where
there are internal spaces in strings), then I believe that you can specify a
multispace parsing string with -split-.  (See first example below.)  Be
aware that -infix- strips leading spaces at the beginning of the
record; -filefilter- can help to remedy that beforehand if needed.

In cases where the input file is messy, you can use Stata's conventional
string functions and new regular expression functions after -infix str v1
1-244 using <filename>-.  I've just finished such a project (the data were
imbedded in prettily formatted .pdf files), and Stata's regular expression
functions were a godsend.

If the record length is longer than 244, then I believe that you can -infix
str v1 1-244 str v2 245-488 . . . using <filename>-, and proceed as above.

For -input-:

You don't actually need to guess string length in order to use -input-.
Just specify the maximum and away you go.  (See second example below.)

Joseph Coveney

. set obs 2
obs was 0, now 2

. input str244 a

>                                   a
  1. "a  b  c d"
  2. "e  f g  h i"

. split a, generate(b) parse("  ")
variables created as string:
b1  b2  b3

. list b*, noobs

  | b1    b2    b3 |
  |  a     b   c d |
  |  e   f g   h i |

. clear

. input str244 a byte b str244 c int d str244 e float f

>                                   a         b
> c         d
>                 e          f
  1. abc 3 def 200 ghi 1001.1
  2. lmn -1 opq .m "" 10000
  3. end

. compress
a was str244 now str3
c was str244 now str3
e was str244 now str3

. list, noobs

  |   a    b     c     d     e        f |
  | abc    3   def   200   ghi   1001.1 |
  | lmn   -1   opq    .m          10000 |

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index