Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Getting rid of binary codes so I can read in files


From   "Orian Brook" <ob11@st-andrews.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Getting rid of binary codes so I can read in files
Date   Tue, 10 Jan 2012 15:07:55 -0000

Dear all

I'm analysing administrative data which I've had to export using an online
database into 105 files. I've previously worked with similar files by
importing and combining them all in Outlook, then reading into stata using
an odbc link, but I'd really like to try to do it all in stata (so I have
the do file for repetition/audit trail purposes) but I have some problems.

The original files has extra EOL characters, and extended ones, which I can
get rid of using filefilter, but I still can't import the file: using
insheet I get the correct number of rows and columns, but all cells are
blank except the first (it has a t in it). I've also tried using infile and
skipping the first line, to no avail. Running hexdump shows that I have over
2million binary 0s, which I think may be the problem? I tried using the
command "filefilter file1 file2, from(\00hd) to() replace" to get rid of
them, but it hangs. 

Any help would be very gratefully received. The hexdump is below.

Regards

Orian Brook


  Line-end characters                        Line length (tab=1)
    \r\n         (Windows)         26,823      minimum
2
    \r by itself (Mac)                  0      maximum
403
    \n by itself (Unix)                 0
  Space/separator characters                 Number of lines
26,824
    [blank]                       107,191      EOL at EOF?
no
    [tab]                               0
    [comma] (,)                   509,637    Length of first 5 lines
  Control characters                           Line 1
403
    binary 0                    2,747,580      Line 2
185
    CTL excl. \r, \n, \t                0      Line 3
243
    DEL                                 0      Line 4
245
    Extended (128-159,255)              0      Line 5
245
  ASCII printable
    A-Z                           189,766
    a-z                           189,754    File format
BINARY
    0-9                         1,509,729
    Special (!@#$ etc.)           187,857
    Extended (160-254)                  0
                          ---------------
  Total                         5,495,160

  Observed were:
     \0 \n \r blank , - . / 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N
O
     P Q R S T U V W X Y Z _ a b c d e f g h i k l m n o p q r s t u v x y

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index