Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: trim spaces in postcode


From   Ronnie Babigumira <[email protected]>
To   [email protected]
Subject   Re: st: trim spaces in postcode
Date   Mon, 03 Apr 2006 11:16:29 +0200

Many thanks Joseph

Joseph Coveney wrote:
Ronnie Babigumira wrote:

Yes I have used and continue to use -insheet- (mainly if I have tab
delimited data from excel) and specifying the string length is not a problem
with -insheet-. That said, there are situations where -infile- is the
appropriate command and of course -input- is invaluable when I want to input
a few entries. In this case I have to specify the length of string variables

Ada Ma wrote:

have you tried -insheet-??

<snip>
Is there something in newer versions of Stata that would save me from
guessing the length of strings when using -infile-
and -input-
</snip>

--------------------------------------------------------------------------------

For -infile-:

If your input file is space-delimited (that is, spaces aren't used to
represent missing values and there aren't internal spaces in strings), then
you can use -split- after -infix str v1 1-244 using <filename>- for record
lengths up to 244 bytes.  You can then -destring- to restore numeric
variables.

In cases of multispace-delimited files (typically used, for example, where
there are internal spaces in strings), then I believe that you can specify a
multispace parsing string with -split-.  (See first example below.)  Be
aware that -infix- strips leading spaces at the beginning of the
record; -filefilter- can help to remedy that beforehand if needed.

In cases where the input file is messy, you can use Stata's conventional
string functions and new regular expression functions after -infix str v1
1-244 using <filename>-.  I've just finished such a project (the data were
imbedded in prettily formatted .pdf files), and Stata's regular expression
functions were a godsend.

If the record length is longer than 244, then I believe that you can -infix
str v1 1-244 str v2 245-488 . . . using <filename>-, and proceed as above.

For -input-:

You don't actually need to guess string length in order to use -input-.
Just specify the maximum and away you go.  (See second example below.)

Joseph Coveney


. set obs 2
obs was 0, now 2

. input str244 a


                                  a
  1. "a  b  c d"
  2. "e  f g  h i"

. split a, generate(b) parse("  ")
variables created as string:
b1  b2  b3

. list b*, noobs

  +----------------+
  | b1    b2    b3 |
  |----------------|
  |  a     b   c d |
  |  e   f g   h i |
  +----------------+

. clear

. input str244 a byte b str244 c int d str244 e float f


                                  a         b


c         d


                e          f
  1. abc 3 def 200 ghi 1001.1
  2. lmn -1 opq .m "" 10000
  3. end

. compress
a was str244 now str3
c was str244 now str3
e was str244 now str3

. list, noobs

  +-------------------------------------+
  |   a    b     c     d     e        f |
  |-------------------------------------|
  | abc    3   def   200   ghi   1001.1 |
  | lmn   -1   opq    .m          10000 |
  +-------------------------------------+

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index