Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Getting rid of binary codes so I can read in files

From   "Orian Brook" <>
To   <>
Subject   st: Getting rid of binary codes so I can read in files
Date   Tue, 10 Jan 2012 15:07:55 -0000

Dear all

I'm analysing administrative data which I've had to export using an online
database into 105 files. I've previously worked with similar files by
importing and combining them all in Outlook, then reading into stata using
an odbc link, but I'd really like to try to do it all in stata (so I have
the do file for repetition/audit trail purposes) but I have some problems.

The original files has extra EOL characters, and extended ones, which I can
get rid of using filefilter, but I still can't import the file: using
insheet I get the correct number of rows and columns, but all cells are
blank except the first (it has a t in it). I've also tried using infile and
skipping the first line, to no avail. Running hexdump shows that I have over
2million binary 0s, which I think may be the problem? I tried using the
command "filefilter file1 file2, from(\00hd) to() replace" to get rid of
them, but it hangs. 

Any help would be very gratefully received. The hexdump is below.


Orian Brook

  Line-end characters                        Line length (tab=1)
    \r\n         (Windows)         26,823      minimum
    \r by itself (Mac)                  0      maximum
    \n by itself (Unix)                 0
  Space/separator characters                 Number of lines
    [blank]                       107,191      EOL at EOF?
    [tab]                               0
    [comma] (,)                   509,637    Length of first 5 lines
  Control characters                           Line 1
    binary 0                    2,747,580      Line 2
    CTL excl. \r, \n, \t                0      Line 3
    DEL                                 0      Line 4
    Extended (128-159,255)              0      Line 5
  ASCII printable
    A-Z                           189,766
    a-z                           189,754    File format
    0-9                         1,509,729
    Special (!@#$ etc.)           187,857
    Extended (160-254)                  0
  Total                         5,495,160

  Observed were:
     \0 \n \r blank , - . / 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N
     P Q R S T U V W X Y Z _ a b c d e f g h i k l m n o p q r s t u v x y

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index