Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: reading data with infix: record too long


From   wgould@stata.com (William Gould, Stata)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: reading data with infix: record too long
Date   Mon, 21 Apr 2003 09:02:38 -0500

Stephan Mankart <smankart@uni-osnabrueck.de> wrote, 

> i am trying to read an ascii fixed format file [...]  at the moment i am
> only trying to extract a single variable from the file and [...]  i get the
> error:  record too long.  one of my suspicions was that stata gets confused
> about unix/windows new lines, [...]

Try the Stata command 

        . hexdump <filename>, analyze

For instance, if you are reading mydata.raw, type 

        . hexdump mydata.raw, analyze

Do not leave off the -analyze- option or you really will get a hexadecmal 
dump of the file.  From -hexdump, analyze-, you will get output that looks 
something like this:

------------------------------------------------------------------------------
  Line-end characters                        Line length (tab=1)
    \r\n         (DOS)                  0      minimum                       56
    \n by itself (Mac)                512      maximum                      147
    \n by itself (Unix)               512
  Space/separator characters                 Number of lines                512
    [blank]                         7,414      EOL at EOF?                  yes
    [tab]                               0
    [comma] (,)                        47    Length of first 5 lines
  Control characters                           Line 1                        80
    binary 0                            0      Line 2                        63
    CTL excl. \r, \n, \t                0      Line 3                        63
    DEL                                 0      Line 4                        63
    Extended (128-159,255)              0      Line 5                        73
  ASCII printable
    A-Z                             3,319
    a-z                            15,723    File format                  ASCII
    0-9                             6,154
    Special (!@#$ etc.)             3,917
    Extended (160-254)                  0
                          ---------------
  Total                            37,086

  Observed were:
     \n blank ( ) , - . 0 1 2 3 4 5 6 7 8 9 : < > @ A B C D E F G H J L M N O
     P R S T U V W _ a b c d e f g h i j k l m n o p q r s t u v w x y z
------------------------------------------------------------------------------

Pay particular attention to Line-end characters, Line length, Number of lines, 
and EOL at EOF (standing for end-of-line at end-of-file".  

-- Bill
wgould@stata.com

P.S.  This is embarrassing:  in the output above, it says 

            \n by itself (Mac)       512
            \n by itself (Unix)      512

      when it should have said 

            \r by itself (Mac)         0
            \n by itself (Unix)      512

     Obviously, we have a bug (we called the count-end-of-lines with \n 
     twice, instead of once with \n and again with \r).  I have added that to
     the to-fix list.

<end>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index